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Preface 


It was our privilege to serve as the program chairs for CAV 2020, the 32nd 
International Conference on Computer-Aided Verification. CAV 2020 was held as a 
virtual conference during July 21—24, 2020. The tutorial day was on July 20, 2020, and 
the pre-conference workshops were held during July 19-20, 2020. Due to the 
coronavirus disease (COVID-19) outbreak, all events took place online. 

CAV is an annual conference dedicated to the advancement of the theory and 
practice of computer-aided formal analysis methods for hardware and software sys- 
tems. The primary focus of CAV is to extend the frontiers of verification techniques by 
expanding to new domains such as security, quantum computing, and machine 
learning. This puts CAV at the cutting edge of formal methods research, and this year's 
program is a reflection of this commitment. 

CAV 2020 received a very high number of submissions (240). We accepted 18 tool 
papers, 4 case studies, and 43 regular papers, which amounts to an acceptance rate of 
roughly 2796. 'The accepted papers cover a wide spectrum of topics, from theoretical 
results to applications of formal methods. These papers apply or extend formal methods 
to a wide range of domains such as concurrency, machine learning, and industrially 
deployed systems. The program featured invited talks by David Dill (Calibra) and 
Pushmeet Kohli (Google DeepMind) as well as invited tutorials by Tevfik Bultan 
(University of California, Santa Barbara) and Sriram Sankaranarayanan (University of 
Colorado at Boulder). Furthermore, we continued the tradition of Logic Lounge, a 
series of discussions on computer science topics targeting a general audience. 

In addition to the main conference, CAV 2020 hosted the following workshops: 
Numerical Software Verification (NSV), Verified Software: Theories, Tools, and 
Experiments (VSTTE), Verification of Neural Networks (VNN), Democratizing Soft- 
ware Verification, Synthesis (SYNT), Program Equivalence and Relational Reasoning 
(PERR), Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), Formal 
Methods for Blockchains (FMBC), and Verification Mentoring Workshop (VMW). 

Organizing a flagship conference like CAV requires a great deal of effort from the 
community. The Program Committee (PC) for CAV 2020 consisted of 85 members — a 
committee of this size ensures that each member has to review a reasonable number of 
papers in the allotted time. In all, the committee members wrote over 960 reviews while 
investing significant effort to maintain and ensure the high quality of the conference 
program. We are grateful to the CAV 2020 PC for their outstanding efforts in evalu- 
ating the submissions and making sure that each paper got a fair chance. Like last 
year’s CAV, we made the artifact evaluation mandatory for tool paper submissions and 
optional but encouraged for the rest of the accepted papers. The Artifact Evaluation 
Committee consisted of 40 reviewers who put in significant effort to evaluate each 
artifact. The goal of this process was to provide constructive feedback to tool devel- 
opers and help make the research published in CAV more reproducible. The Artifact 
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Evaluation Committee was generally quite impressed by the quality of the artifacts, 
and, in fact, all accepted tools passed the artifact evaluation. Among the accepted 
regular papers, 67% of the authors submitted an artifact, and 76% of these artifacts 
passed the evaluation. We are also very grateful to the Artifact Evaluation Committee 
for their hard work and dedication in evaluating the submitted artifacts. The evaluation 
and selection process involved thorough online PC discussions using the EasyChair 
conference management system, resulting in more than 2,000 comments. 

CAV 2020 would not have been possible without the tremendous help we received 
from several individuals, and we would like to thank everyone who helped make CAV 
2020 a success. First, we would like to thank Xinyu Wang and He Zhu for chairing the 
Artifact Evaluation Committee and Jyotirmoy Deshmukh for local arrangements. We 
also thank Zvonimir Rakamaric for chairing the workshop organization, Clark Barrett 
for managing sponsorship, Thomas Wies for arranging student fellowships, and Yakir 
Vizel for handling publicity. We also thank Roopsha Samanta for chairing the Men- 
toring Committee. Last but not least, we would like to thank members of the CAV 
Steering Committee (Kenneth McMillan, Aarti Gupta, Orna Grumberg, and Daniel 
Kroening) for helping us with several important aspects of organizing CAV 2020. 

We hope that you will find the proceedings of CAV 2020 scientifically interesting 
and thought-provoking! 
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Abstract. Computer science class enrollments have rapidly risen in the 
past decade. With current class sizes, standard approaches to grading 
and providing personalized feedback are no longer possible and new tech- 
niques become both feasible and necessary. In this paper, we present the 
third version of Automata Tutor, a tool for helping teachers and students 
in large courses on automata and formal languages. The second version 
of Automata Tutor supported automatic grading and feedback for finite- 
automata constructions and has already been used by thousands of users 
in dozens of countries. This new version of Automata Tutor supports 
automated grading and feedback generation for a greatly extended vari- 
ety of new problems, including problems that ask students to create reg- 
ular expressions, context-free grammars, pushdown automata and Tur- 
ing machines corresponding to a given description, and problems about 
converting between equivalent models - e.g., from regular expressions to 
nondeterministic finite automata. Moreover, for several problems, this 
new version also enables teachers and students to automatically gener- 
ate new problem instances. We also present the results of a survey run 
on a class of 950 students, which shows very positive results about the 
usability and usefulness of the tool. 


Keywords: Theory of computation - Automata theory - Personalized 
education - Automata tutor - Automated grading 


1 Introduction 


Computer science (CS) class enrollments have been rapidly rising, e.g., CS enroll- 
ment roughly triples per decade at Berkeley and Stanford [12] or TU Munich. 
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Both online and offline courses and degrees are being created to educate students 
and professionals in computer science and these courses may soon have thou- 
sands of students attending a lecture, or tens of thousands following a Massive 
Online Open Course (MOOC). At these scales, standard approaches to grading 
and providing personalized feedback are no longer possible and new techniques 
become both feasible and necessary. Current approaches for handling this grow- 
ing student volume include reducing the complexity of assignments or relying on 
imprecise feedback and grading mechanisms. Simpler assessment mechanisms, 
e.g., multiple-choice questions, are easier to grade automatically but lack real- 
ism [8]. Designing better techniques for automated grading and feedback gener- 
ation is therefore a necessity. 

Recent advances in formal methods, including program synthesis and verifi- 
cation, can help teachers and students in verifiably correct ways that statistical 
or rule-based techniques cannot. For example, formal methods have been used 
to identify student errors and provide feedback for problems related to intro- 
ductory Python programming assignments [17] geometry [9,11], algebra [16], 
logic [2], and automata [3,6]. In particular, for this last topic, the tool Automata 
Tutor v2 [7] has already been used by more than 9,000 students at more than 
30 universities in North America, South America, Europe, and Asia. 

In this paper, we present Automata Tutor v3, an online! tool that extends 
Automata Tutor v2 and uses techniques from program synthesis and deci- 
sion procedures to improve the quality and effectiveness of teaching courses on 
automata and formal languages. Besides being part of the standard CS cur- 
riculum, the concepts taught in these courses are rich in structure and applica- 
tions, e.g., in control theory, text editors, lexical analyzers, or models of software 
interfaces. Concrete topics in such curricula include automata, regular expres- 
sions, context-free grammars, and Turing machines. For problems and assign- 
ments related to these topics Automata Tutor v3 can automatically: (1) Detect 
whether the student’s solution is correct. (2) Detect different types of student’s 
mistakes and translate them into explanatory feedback. (3) If possible, generate 
new problems together with the corresponding solutions for teachers to use in 
class. 

Automata Tutor v3 greatly expands its predecessor Automata Tutor v2, 
which only provides ways to pose and solve problems for deterministic and non- 
deterministic finite automata constructions. This paper describes the new com- 
ponents introduced by Automata Tutor v3 and how this new version improves 
on its previous one. The key advantages to its competitors are the breadth, 
automatic generation and grading of exercises, infrastructure allowing for use 
in large courses and a useful feedback to the students, compared to text-based 
interfaces used by Autotool [13], rudimentary feedback in JFLAP [14] and none 
in Gradience [1]. 

Since Automata Tutor has already been well received by teachers around the 
world, we believe that the readers from the CAV community will find great value 
in knowing about this new and fundamentally richer version of the tool and how 
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it can extensively help with teaching the automata and formal languages courses, 
a task we know many of the attendees have to face on a yearly basis. 
Our contributions are the following: 


- Twelve new types of problems (added to the four problems from the 
previous version) that can be created by teachers and for which the tool 
can assign grades together with feedback to student attempts. While the 
previous version of Automata Tutor could only support problems involv- 
ing finite automata constructions, Automata Tutor v3 now supports prob- 
lems for proving language non-regularity using the pumping lemma, building 
regular expressions, context free grammars, pushdown automata and Turing 
machines, and conversions between such models. 

— Automatic problem generation for five types of problems, with the code 
modularity allowing to add it for all the others. This feature allows teachers 
to effortlessly create new assignments, or students to practice by themselves 
with potentially infinitely many exercises. 

— A new and improved user interface that allows teachers and students 
to navigate the increased number of problem types and assignments. Fur- 
thermore, each problem type comes with an intuitive user interface (e.g., for 
drawing pushdown automata). 

— An improved infrastructure for the use in large courses, in particular, incor- 
porating login systems (e.g. LDAP or OAuth), getting a certified mapping 
from users to students and enabling teachers to grade homework or exams. 

— A user study run on a class of 950 students to assess the effectiveness and 
usability of Automata Tutor v3. In our survey, students report to have learned 
quickly, felt confident, and enjoyed using Automata Tutor v3, and found it 
easy to use. Most importantly, students found the feedback given by the tool 
to be useful and claimed they understood more after using the tool and felt 
better prepared for an upcoming exam. In our personal experience, the tool 
saves us dozens of thousands of corrections in each single course. 


2 Automata Tutor in a Nutshell 


Automata Tutor is an online education tool created to support courses teaching 
basic concepts in automata and formal languages [7]. In this section, we describe 
how Automata Tutor helps teachers run large courses and students learn effi- 
ciently in such courses. 


Learning Without Automata Tutor. Figurel schematically shows a student- 
teacher interaction in a course taught without an online tutoring system. The 
teacher creates exercises, grades them manually, and (sometimes) manually pro- 
vides personalized feedback to the students. This type of interaction has many 
limitations: (1) it is asynchronous (i.e., the student has to wait a long time for 
what is often little feedback) and does not scale to large classrooms, posing 
strenuous amount of work on teachers, (2) it does not guarantee consistency in 
the assigned grades and feedback, and (3) it does not allow students to revise 
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feedback 


Fig. 1. Common structure of practical sessions for CS classes. 


their solutions upon receiving feedback as the teachers often release a solution 
to all students as part of the feedback and do not grade new submissions. 
Another drawback of this interaction is the limited number of problems stu- 
dents can practice on. Because teachers do not have the resources to create many 
practice problems and provide feedback for them, students are often forced to 
search the Internet for old exams and practice sheets or even exercises from 
other universities. Due to the lack of feedback, this chaotic search for practice 
problems often ends up confusing the students rather than helping them. 


Automata Tutor v3 


Automatic 
Problem 
Generation 


Students 


| DFA | NFA | RE PDA CFG TM 


Fig.2. Overview of Automata Tutor v3 (our contributions in green). The teacher 
creates exercises on various topics. The students solve the exercises in a feedback cycle: 
After each attempt they are automatically graded and get personalized feedback. The 
teacher has access to the grade overview. For additional practice, students can generate 
an unlimited number of new exercises using the automatic problem generation. (Color 
figure online) 


Teacher 


Learning with Automata Tutor. Figure 2 shows the improved interaction offered 
by Automata Tutor v3. Here, a teacher creates the problem instances with the 
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Alphabet: ab apply 


Stack alphabet (the first symbol is the initial one): Z Y X 


Acceptance condition: final state Iv] 


Deterministic (DPDA): [ ] 


HELP: PDA Canvas Tutorial + 


Run a simulation to test the created PDA 
(this is only for you to chec your PDA): 


enter word to simulate Start simulation 


Short Description: AEC Test Problem 


Long Description (will appear in the 
problem in the form of "Construct a PDA 
that recognizes the following language: 
{long desciption)"): 


{a^n b^n | n > 0) 


Stac alphabet should be given: 


Allow simulation before submitting correct 
solution: 


Fig. 3. Creating a new problem of type ^PDA Construction". 


help of the tool. The problems are then posed to the students and, no matter how 
large a class is, Automata Tutor automatically grades the solution attempts of 
students right when they are submitted and immediately gives detailed and per- 
sonalized feedback for each submission. If required, e.g. for a graded homework, 
it is possible to restrict the number of attempts. Using this feedback, the stu- 
dents can immediately try the problem again and learn from their mistakes. As 
shown in a large user study run on the first version of Automata Tutor [6], this 
fast feedback cycle is encouraging for students and results in students sponta- 
neously exploring more practice problems and engaging with the course material. 
Additional practice is supported by the automatic problem generation, with the 
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Run a simulation: 


enter word to simulate 
Grade: 1/10 
Feedback: 


* your pda recognizes a superset of the given language 
e the word "aab" is not in the given language, but it is recognized by your pda 


Fig. 4. Feedback received when solving the problem created in Fig. 3. 


same level of detailed and personalized feedback as before without increasing the 
workload of the teacher. Furthermore, automatic problem generation can assist 
the teacher in creating new exercises. Finally, whenever necessary, the teacher 
can download an overview of all the grades. 


Improved User interface. Automata Tutor is an online tool which runs in the 
most used browsers. A new collapsible navigation bar groups problems by topic, 
facilitating quick access to exercises and displaying the structure of the course 
(see Figure 6 in [5, Appendix B]). To create a new exercise, a teacher clicks the 
“4” button and is presented the view of Fig. 3. In this case, the drawing canvas 
allows to easily specify the sample solution pushdown automaton. Similarly, 
when students solve this exercise, they draw their solution attempt also on the 
canvas. After submitting, they receive their personalized feedback and grade (see 
example in Fig. 4). For the automatic problem generation, a dropdown menu to 
select the problem type and a slider to select the difficulty is displayed together 
with the list of all problems the user has generated so far (see the screenshot in 
Figure 7 in [5, Appendix BJ). 


3 Design 


3.1 University and Course Management 


While Automata Tutor can be used for independent online practice, one of the 
main advantages is its infrastructure for large university courses. To this end, 
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it is organized in courses. A course is created and supervised by one or more 
teachers. Together, they can create, test and edit exercises. The students can- 
not immediately see the problems, but only after the teachers have decided to 
pose them. This involves setting the maximum number of points, the number of 
allowed attempts as well as the start and end date. 

To use Automata Tutor, students must have an account. One can either 
register by email or, in case the university supports it, login with an external login 
service like LDAP or Oauth. When using the login service of their university, 
teachers get a certified mapping from users to students and enabling teachers to 
use Automata Tutor v3 for grading homework or exams. 

Students can enroll in a course using a password. Enrolled students see all 
posed problems and can solve them (using the allowed number of attempts). The 
final grade can be accessed by the teachers in the grade overview. 


3.2 New Problem Types 


In this section, we list the problem types newly added to Automata Tutor v3. 
They are all part of the course [10] and a detailed description of each problem 
can be found in [5, Appendix A], including the basic theoretical concept, how 
a student can solve such a problem, what a teacher has to provide to create a 
problem, the idea of the grading algorithm, and what feedback the tool gives. 


RE/CFG/PDA Words: Finding words in or not in the language of a regular 
expression, context free grammar or pushdown automaton. 

RE/CFG/PDA Construction: Given a description of a language, construct a 
regular expression, context free grammar or pushdown automaton. 

RE to NFA: Given a regular expression, construct a nondeterministic-finite 
automaton. 

Myhill-Nerode Equivalence Classes: There are two subtypes: either, given a reg- 
ular expression and two words, find out whether they are equivalent w.r.t. 
the language, or, given a regular expression and a word, find further words 
in the same equivalence class. 

Pumping-Lemma Game: Given a language, the student has to guess whether it 
is regular or not and then plays the game as one of the quantifiers. 

Find Derivation: Given a context free grammar and a word, the student has to 
specify a derivation of that word. 

CNF: Given a context free grammar, the student has to transform it into Chom- 
sky Normal Form. 

CYK: Given a context free grammar in CNF and a word, the student has to 
decide whether the word is in the language of the grammar by using the 
Cocke-Younger—Kasami algorithm. 

While to TM: Given a while-program (a Turing-complete programming language 
with very restricted syntax), construct a (multi-tape) Turing machine with 
the same input-output behaviour. 
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3.3 Automatic Problem Generation 


Automatic Problem Generation (APG) allows one to generate new exercises of 
a requested difficulty level and problem type. This allows students to practice 
independently and supports teachers when creating new exercises. While APG 
is currently implemented for four CFG problem types and for the problem type 
“While to TM”, it can be easily extended to other problem types by providing 
the following components: 


— Procedure for generating exercises at random either from given basic 
building blocks or from scratch. 

— A “quality” metric qual(E) for assessing the quality of the generated exer- 
cise E, ranging from trivial or infeasible to realistic. 

— A “difficulty” metric diff (E) for assessing the difficulty of E. 


Given these components, Automata Tutor generates a new problem with a given 
minimum difficulty dij; and maximum difficulty dmax as follows. Firstly, 100 
random exercises are generated. Secondly, Automata Tutor chooses exercises E 
with the best quality such that dmin € diff (E) € dmax- 

Concretely, for the CFG problem types, CFGs with random productions are 
generated and sanitized. Resulting CFGs that do not accept any words or have 
too few productions are excluded using the quality metric. The difficulty metric 
always depends on the number of productions; additionally, depending on the 
exact problem type, further criteria are taken into account. 

For the problem type *While to TM" we use an approach similar to the 
one suggested in existing tools for automatic problem generation [15,18]: We 
handcrafted several base programs which are of different difficulty level. In the 
generation process, the syntax tree of such a base program is abstracted and 
certain modifying operations are executed; these change the program without 
affecting the difficulty too much. E.g. we choose different variables, switch the 
order of if-else branches or change arithmetic operators. Then several programs 
are generated and those of bad quality are filtered out. A program is of bad 
quality if its language is trivially small or if it contains infinite loops; since 
detecting these properties is undecidable, we employ heuristics such as checking 
that the loops terminate for all inputs up to a certain size with a certain timeout. 


4 Implementation and Scalability 


Automata Tutor v3 is open source and it consists of a frontend, a backend, and 
a database. It also provides a developer’s manual for creating new exercises. 

'The frontend, written in scala, renders the webpage. The drawing canvases 
for the different automata and the Turing machines rely on javascript. The fron- 
tend and backend communicate using XML objects. 

The backend, written in C#, contains methods to unpack the xml of the 
frontend to compute the grade and feedback for solutions. It is also used to 
check the syntax of exercises and for the automatic problem generation. It relies 


Automata Tutor v3 11 


on AutomataDotNet?, a library that provides efficient algorithms for automata 
and regular expressions. 

'The database keeps track of existing users, problems and courses. It uses the 
H2 Database Engine. 

All the new parts of Automata Tutor v3 were developed and tested over the 
last 3 years at TU Munich, where they were used to support the introductory 
theoretical computer science course. This local deployment served as an impor- 
tant test-bed before publicly deploying the tool online at large scale. Due to its 
modular structure, the tool is easily scalable by having multiple frontends and 
backends together with a load distributor. This approach has successfully scaled 
to 950 concurrent student users; for this, we used 7 virtual machines: 3 host- 
ing frontends, 3 hosting backends (each with 2 cores 2.60 GHz Intel(R) Xeon(R) 
CPU and 4 GB RAM), and 1 for load distribution and the database (with 4 such 
cores and 8 GB RAM). We will scale the number of machines based on need. 


5 Evaluation and User Study 


Large-Class Deployment. In the latest iteration of the TU Munich course 
in 2019, we used Automata Tutor v3 (in the following denoted as AT) in a 
mandatory homework system for a course with about 950 students; the home- 
work system also included written and programming exercises. In total, we posed 
79 problems consisting of 18 homework and 61 practice problems. The teachers 
saved themselves the effort of correcting 26,535 homework exercises, and the 
students used AT to get personalized feedback for their work 76,507 times. On 
average, each student who used AT did so 107 times. 


Student Survey Results. At the end of the course, we conducted an 
anonymized survey, based on the System Usability Survey [4]. 14.6% of the 
students in the course answered the survey, which is an ordinary rate of return 
for an online questionnaire, especially given that there was no incentive. The 
students were given statements to judge on a Likert scale from 1 to 5 (strongly 
disagree to strongly agree). We define "The students agreed with the following 
statement" to mean that the average and median scores were at least 4 and less 
than 1096 of the students chose a score below 3. Dually, if the students disagreed 
with the statement with median and average score that was at most 2 and less 
than 10% having a score greater than 3, we say that they “agreed with the 
negation of the statement”. For all statements that do not satisfy either of the 
criteria, we report mixed answers. The full survey results can be found in [5, 
Appendix C]. 


Usability. Regarding the usability of the tool, the students agreed with the 
following statements: 


? https://github.com/AutomataDotNet/ Automata. 
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— I quickly learned to use the AT. 

— I do not need assistance to use the AT. 

— I feel confident using the AT. 

— The AT is easy to use. 

— I enjoy using the AT/the AT is fun to use. 


However, there were lots of valuable suggestions for improvements, many of 
which we have implemented since then. Moreover, the survey also revealed space 
for improvement, in particular for streamlining as documented by the following 
statements where the answers were more mixed: 


— The AT is unnecessarily complex. 
— 'The canvas for drawing is intuitive. 
— 'The use of AT is self-explanatory. 


Usefulness. Regarding how useful AT was for learning, the students agreed with 
the following statements: 


— I understand more after using the AT. 

— I prefer using the AT to using pen and paper exercises (12.9% disagreed, but 
median and average are 4). 

— The feedback of the AT was helpful and instructive. 

— The exercises within the AT are well-designed. 

— The AT fits in well with the programming tasks and written homework. 

— The AT did not hinder my learning. 

— I feel better prepared for the exam after using AT. 

— The feedback of the AT was not misleading /confusing. 


Note that there are no statements with mixed or negative answers regarding the 
usefulness. Additionally, as shown in Fig.5, when we asked students about their 
preferred means of learning, AT gets the highest approval rate, being preferred 
to written or programming exercises as well as lectures. 


What are your preferred means of learning? 
(Multiple answers possible.) 


Lecture [ ) 30.2% pele 
Writtenexercisesf — ) 67.6% 
Programming | } 56.8% 
Automata Tutor Tool ( i 76.3% 
Individual learning (via script, book, or videostream) | } 56.8% 
Group discussion/ learning group | ) 32.4% 


Fig. 5. Question from the survey we conducted to evaluate Automata Tutor, showing 
that the tool is preferred by a majority of students. 


Overall, this class deployment of Automata Tutor v3 and the accompanying 
surveys were great successes, and showed how the tool is of extreme value for 
both students and teachers, in particular for such large a course. 
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6 Conclusion 


This paper presents the third version of Automata Tutor, an online tool helping 
teachers and students in large automata/computation theory courses. Automata 
Tutor v3 now supports automated grading and feedback generation for a wide 
variety of problems and, for some of them, even automatic generation of new 
problem instances. Furthermore, it is easy to extend and we invite the community 
to contribute by implementing further exercises. Finally, our experience shows 
that Automata Tutor v3 improves the economical aspects of teaching greatly as 
it scales effortlessly with the number of students. 

Earlier versions of Automata Tutor have already been adopted by thousands 
of students at dozens of schools and we hope this paper allows Automata Tutor v3 
to help even more students and teachers around the world. 
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Abstract. We present the second generation of the tool Seminator that 
transforms transition-based generalized Biichi automata (TGBAs) into 
equivalent semi-deterministic automata. The tool has been extended with 
numerous optimizations and produces considerably smaller automata 
than its first version. In connection with the state-of-the-art LTL to 
TGBAs translator Spot, Seminator 2 produces smaller (on average) 
semi-deterministic automata than the direct LTL to semi-deterministic 
automata translator 1tl21dgba of the Owl library. Further, Seminator 2 
has been extended with an improved NCSB complementation procedure 
for semi-deterministic automata, providing a new way to complement 
automata that is competitive with state-of-the-art complementation tools. 


1 Introduction 


Semi-deterministic [24] automata are automata where each accepting run makes 
only finitely many nondeterministic choices. The merit of this interstage between 
deterministic and nondeterministic automata comes from two facts known since 
the late 1980s. First, every nondeterministic Büchi automaton with n states can 
be transformed into an equivalent semi-deterministic Büchi automaton with at 
most 4" states [7,24]. Note that asymptotically optimal determinization pro- 
cedures transform nondeterministic Büchi automata to deterministic automata 
with 2001087) states [24] and with a more complex (typically Rabin) acceptance 
condition, as deterministic Büchi automata are strictly less expressive. Second, 
some algorithms cannot handle nondeterministic automata, but they can handle 
semi-deterministic ones; for example, algorithms for qualitative model checking 
of Markov decision processes (MDPs) [7,29]. 

For theoreticians, the difference between the complexity of determinization 
and semi-determinization is not dramatic—both constructions are exponential. 
However, the difference is important for authors and users of practical automata- 
based tools—automata size and the complexity of their acceptance condition often 
have a significant impact on tool performance. This latter perspective has recently 
© The Author(s) 2020 
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initiated another wave of research on semi-deterministic automata. Since 2015, 
many new results have been published: several direct translations of LTL to semi- 
deterministic automata [11, 15, 16, 26], specialized complementation constructions 
for semi-deterministic automata [4,6], algorithms for quantitative model checking 
of MDPs based on semi-deterministic automata [13, 25], a transformation of semi- 
deterministic automata to deterministic parity automata [10], and reinforcement 
learning of control policy using semi-deterministic automata [21]. 

In 2017, we introduced Seminator 1.1 [5], a tool that transforms nondeter- 
ministic automata to semi-deterministic ones. The original semi-determinization 
procedure of Courcoubetis and Yannakakis [7] works with standard Büchi 
automata (BAs). Seminator 1.1 extends this construction to handle more com- 
pact automata, namely transition-based Btichi automata (TBAs) and transition- 
based generalized Büchi automata (TGBAs). TBAs use accepting transitions 
instead of accepting states, and TGBAs have several sets of accepting tran- 
sitions, each of these sets must be visited infinitely often by accepting runs. 
'The main novelty of Seminator 1.1 was that it performed degeneralization and 
semi-determinization of a TGBA simultaneously. As a result, it could translate 
TGBAs to smaller semi-deterministic automata than (to our best knowledge) 
the only other tool for automata semi-determinization called nba21dba [26]. 
This tool only accepts BAs as input, and thus TGBAs must be degeneralized 
before nba21dba is called. 

Moreover, in connection with the LTL to TGBAs translator ltl2tgba of 
Spot [8], Seminator 1.1 provided a translation of LTL to semi-deterministic 
automata that can compete with the direct LTL to semi-deterministic TGBAs 
translator 1t121dba [26]. More precisely, our experiments [5] showed that the com- 
bination of 1t12tgba and Seminator 1.1 outperforms 1t121dba on LTL formulas 
that 1t12tgba translates directly to deterministic or semi-deterministic TGBA 
(i.e., when Seminator has no work to do), while 1t121dba produced (on average) 
smaller semi-deterministic TGBAs on the remaining LTL formulas (i.e., when the 
TGBA produced by 1t12tgba has to be semi-determinized by Seminator). 

This paper presents Seminator 2, which changes the situation. With many 
improvements in semi-determinization, the combination of 1t12tgba and Semi- 
nator 2 now translates LTL to smaller (on average) semi-deterministic TGBAs 
than 1t121dba even for the cases when 1t12tgba produces a TGBA that is not 
semi-deterministic. Moreover, this holds even when we compare to 1t12ldgba, 
which is the current successor of 1t121dba distributed with Owl [19]. 

Further, Seminator 2 now provides a new feature: complementation of 
TGBAs. Seminator 2 chains semi-determinization with the complementation 
algorithm called NCSB [4,6], which is tailored for semi-deterministic BAs. Our 
experiments show that the complementation in Seminator 2 is fully competitive 
with complementations implemented in state-of-the-art tools [1,8, 20,23,30]. 


2 Improvements in Semi-determinization 


First of all, we recall the definition of semi-deterministic automata and principles 
of the semi-determinization procedure implemented in Seminator 1.1 [5]. 
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Fig. 1. Structure of a semi-deterministic automaton. The deterministic part contains 
all accepting transitions and states reachable from them. Cut-transitions are magenta. 


Let A = (Q, X, ô, qo, (F3, ..., F,]) bea TGBA over alphabet X, with a finite 
set of states Q, a transition relation 6 C Q x X x Q, an initial state qo € Q, 
and sets of accepting transitions Fi,..., Fa C 6. Then A is semi-deterministic 
if there exists a subset Qp C Q such that (i) each transition from Qp goes 
back to Qp (ie, N (Qp x X x (Qx Qp)) = 0), (ii) all states of Qp are 
deterministic (i.e., for each q € Qp and a € X there is at most one q' such that 
(q,a,q') € 6), and (iii) each accepting transition starts in a state of Qp (i.e., 
P.oc.4Fs C Qp x Mx Qp). 

The part of A delimited by states of Qp is called deterministic, while the 
part formed by the remaining states Qx Qp is called nondeterministic, although 
it could contain deterministic states too. The transitions leading from the nonde- 
terministic part to the deterministic one are called cut-transitions. The structure 
of a semi-deterministic automaton is depicted in Fig. 1. 

Intuitively, a TGBA A with a set of states Q and a single set of accepting 
transitions F can be transformed into a semi-deterministic TBA B as follows. 
First, we use a copy of A as the nondeterministic part of 6. The deterministic 
part of B has states of the form (M, N) such that Q 2 M 2 N and M F Í. 
Every accepting transition (q, a, q') € F induces a cut-transition (q, a, ({q’},9)) 
of B. The deterministic part is then constructed to track all runs of .A from each 
such state q’ using the powerset construction. More precisely, the first element 
of (M, N) tracks all runs while the second element tracks only the runs that 
passed some accepting transition of F. Each transition of the deterministic part, 
that would reach a state where M = N (so-called breakpoint) is replaced with 
an accepting transition of B leading to state (M, N’), where N’ tracks only the 
runs of A passing an accepting transition of F in the current step. 

Seminator 1.1 extended this procedure to construct a semi-deterministic TBA 
even for a TGBA with multiple acceptance sets F1,..., Fn. States of the deter- 
ministic part are now triples (M, N,i), where i € (0,...,n — 1] is called level 
and it has a similar semantics as in degeneralization. Cut-transitions are induced 
by transitions of F, and they lead to states of the form (14/1, 0,0). The level 
i says that N tracks runs that passed a transition of P;,; since the last level 
change. When the deterministic part reaches a state (M, N,i) with M = N, we 
change the level to i’ = (i + 1) mod n and modify N to track only runs passing 
Fy44in the current step. Transitions changing the level are accepting. 

A precise description of these semi-determinization procedures and proofs of 
their correctness can be found in Blahoudek's dissertation [3]. Now we briefly 
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explain the most important optimizations added in Seminator 2 (we work 
on a journal paper with their formal description). Each optimization can be 
enabled /disabled by the corresponding option. All of them are enabled by default. 


--scc-aware approach identifies, for each cut-transition, the strongly connected 
component (SCC) of A that contains the target of the transition triggering 
the cut-transition. The sets M, N then track only runs staying in this SCC. 

--reuse-deterministic treats in a specific way each deterministic SCC from 
which only deterministic SCCs are reachable in A: it (i) does not include 
them in the nondeterministic part, and (ii) copies them (and their succes- 
sors) in the deterministic part as they are, including the original acceptance 
transitions. This optimization can result in a semi-deterministic TGBA with 
multiple acceptance sets on output. 

--cut-always changes the policy when cut-transitions are created: they are 
now triggered by all transitions of .A with the target state in an accepting 
SCC. 

--powerset-on-cut applies the powerset construction when computing tar- 
gets of cut-transitions. The target of a cut-transition leading from q is con- 
structed in the same way as the successor of the hypothetical state ({q}, 0,0) 
of the deterministic part. 

--skip-levels is a variant of the level jumping trick from TGBA degeneraliza- 
tion [2]. Roughly speaking, a single transition in the deterministic part can 
change the level i directly to i+ j where j > 1 if all runs passed acceptance 
transitions from all the sets F;,1,..., F;,;j in the current step. 

--jump-to-bottommost makes sure that all cut-transitions leading to states 
with the same M component lead to the same state (M, N,i) for some N 
and i. It relies on the fact that each run takes only one cut-transition, and 
thus only the component M of the cut-transition's target state is important 
for determining the acceptance of the run. During the original construction, 
many states of the form (M, N', i") may appear in different SCCs. After the 
construction finishes, this optimization redirects each cut-transition leading 
to (M, N', i) to some state of the form (M, N,i) that belongs to the bot- 
tommost SCC (in a topological ordering of the SCCs) that contains such a 
state. This is inspired by a similar trick used by Křetínský et al. [18] in a 
different context. 

--powerset-for-weak simplifies the construction for weak accepting SCCs 
(i.ee., SCCs where all cycles are accepting) of A. For such SCCs it just 
applies the powerset construction (builds states of the form M instead of 
triples (M, N,i)) with all transitions accepting in the deterministic part. 


Note that Seminator 1.1 can produce a semi-deterministic TGBA with multiple 
acceptance sets only when it gets a semi-deterministic TGBA as input. Semina- 
tor 2 produces such automata more often due to --reuse-deterministic. 


3 Implementation and Usage 


Seminator 2 is an almost complete rewrite of Seminator [5], and is still distributed 
under the GNU GPL 3.0 license. Its distribution tarball and source code history 
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to --tba (default), or --ba. used unless --complement={spot ,pldi}. 


Fig. 2. Workflow for the two operation modes of seminator: semi-determinizing and 
complementing via semi-determinization. 


are hosted on GitHub (https: //github.com/mklokocka/seminator). The package 
contains sources of the tool with two user-interfaces (a command-line tool and 
Python bindings), a test-suite, and some documentation. 

Seminator is implemented in C++ on top of the data-structures provided 
by the Spot library [8], and reuses its input/output functions, simplification 
algorithms, and the NCSB complementation. The main implementation effort 
lies in the optimized semi-determinization and an alternative version of NCSB. 

The first user interface is a command-line tool called seminator. Its high- 
level workflow is pictured in Fig.2. By default (top-part of Fig.2) it takes a 
TGBA (or TBA or BA) on input and produces a semi-deterministic TGBA 
(or TBA or BA if requested). Figure 2 details various switches that control the 
optional simplifications and acceptance transformations that occur before the 
semi-determinization itself. The pre- and post-processing are provided by the 
Spot library. The semi-determinization algorithm can be adjusted by additional 
command-line options (not shown in Fig.2) that enable or disable optimiza- 
tions of Sect. 2. As Spot simplification routines are stronger on automata with 
simpler acceptance conditions, it sometimes pays off to convert the automa- 
ton to TBA or BA first. If the input is a TGBA, seminator attempts three 
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semi-determinizations, one on the input TGBA, one on its TBA equivalent, and 
one on its BA equivalent; only the smallest result is retained. If the input is 
already a TBA (resp. a BA), only the last two (resp. one) routes are attempted. 

The --complement option activates the bottom part of Fig. 2 with two vari- 
ants of the NCSB complementation [4]: "spot" stands for a transition-based 
adaptation of the original algorithm (implemented in Spot); "pldi" refers to 
its modification based on the optimization by Chen et al. [6, Section 5.3] (imple- 
mented in Seminator 2). Both variants take a TBA as input and produce a TBA. 
The options --tba and --ba apply on the final complement automaton only. 

The seminator tool can now process automata in batch, making it possible 
to build pipelines with other commands. For instance the pipeline 
ltl2tgba «input.ltl | seminator | autfilt --states-3.. >output.hoa 
uses Spot’s 1t12tgba command to read a list of LTL formulas from input .1t1 
and transform it into a stream of TGBAs that is passed to seminator, which 
transforms them into semi-deterministic TGBAs, and finally Spot’s autfilt 
saves into output.hoa the automata with 3 states or more. 

Python bindings form the second user-interface and are installed by the Sem- 
inator package as an extension of Spot's own Python bindings. It offers several 
functions, all working with Spot’s automata (twa graph objects): 


semi determinize() implements the semi-determinization procedure; 

complement semidet() implements the "pldi" variant of the NCSB comple- 
mentation for semi-deterministic automata (the other variant is available 
under the same function name in the bindings of Spot); 

highlight components() and highlight.cut() provide ways to highlight the 
nondeterministic and the deterministic parts of a semi-deterministic automa- 
ton, and its cut-transitions; 

seminator() provides an interface similar to the command-line seminator tool 
with options that selectively enable or disable optimizations or trigger com- 
plementation. 


The Python bindings integrate well with the interactive notebooks of Jupyter [17]. 
Figure3 shows an example of such a notebook, using the seminator() and 
highlight components() functions. Additional Jupyter notebooks, distributed 
with the tool, document the effect of the various optimization options.! 


4 Experimental Evaluation 


We evaluate the performance of Seminator 2 for both semi-determinization and 
complementation of TGBAs. We compare our tool against several tools listed in 
Table 1. As 1t121dgba needs LTL on input, we used the set of 221 LTL formulas 
already considered for benchmarking in the literature [9, 12, 14, 22,27]. To provide 
TGBAs as input for Seminator 2, we use Spot’s 1t12tgba to convert the LTL 
formulas. Based on the automata produced by 1tl2tgba, we distinguish three 


! https: //nbviewer.jupyter.org/github/mklokocka/seminator/tree/v2.0/notebooks/. 
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In [1]: import spot 
from spot.jupyter import display inline 
from spot.seminator import seminator, highlight components 
spot.setup(show default='.v') # vertical display 


In [2]: nba = spot.translate('F(a & GFb) R c') 
sdba = seminator(nba) 
dba = spot.postprocess(nba, 'det') 
# add some colors 
spot.highlight nondet edges(nba, 5) 
highlight components(sdba) 


In [3]: display inline(nba, sdba, dba) 


Inf) Inf( y) Inf(@) 
[Büchi] [Büchi] [Büchi] 


In [4]: assert dba.num states() == spot.sat minimize(dba).num states() 


Fig. 3. Jupyter notebook illustrating a case where a nondeterministic TBA (nba, left) 
has an equivalent semi-deterministic TBA (sdba, middle) that is smaller than a minimal 
deterministic TBA (dba, right). Accepting transitions are labeled by O. 


categories of formulas: deterministic (152 formulas), semi-deterministic but not 
deterministic (49 formulas), and not semi-deterministic (20 formulas). This divi- 
sion is motivated by the fact that Seminator 2 applies its semi-determinization 
only on automata that are not semi-deterministic, and that some complemen- 
tation tools use different approaches to deterministic automata. We have also 
generated 500 random LTL formulas of each category. 

The scripts and formulas used in those experiments can be found online,” 
as well as a Docker image with these scripts and all the tools installed.? All 
experiments were run inside the supplied Docker image on a laptop Dell XPS13 
with Intel i7-1065G7, 16 GB RAM, and running Linux. 


? https: //github.com/xblahoud/seminator-evaluation /. 
3 https:/ /hub.docker.com/r/gadl/seminator. 
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Fig.4. Comparison of the sizes of the 
semi-deterministic automata produced by 
Seminator 2 and Owl for the not semi- 
deterministic random set. 


Table 2. Comparison of semi-determinization tools. A benchmark set marked with 
t + y Č consists of x formulas for which all tools produced some automaton, and y 
formulas leading to some timeouts. A cell of the form s(m) shows the cumulative 
number s of states of automata produced for the x formulas, and the number m of 
formulas for which the tool produced the smallest automaton out of the obtained 
automata. The best results in each column are highlighted. 


(semi-)deterministic not semi-deterministic 

literature random literature random 

# of formulas 200+1% 1000+0% 19+1% 500+08 
Owl+best 1092 (102) 6335 (454) 281 (6) 5041 (144) 
Owl+best+Spot 978 (139) 5533 (724) 234 (11) 4153 (268) 
Seminator 1.1 787 (201) 4947 (963) 297 (7) 7020 (60) 
Seminator 2 787 (201) 4947 (963) 230 (16) 3956 (356) 


4.1 Semi-determinization 


We compare Seminator 2 to its older version 1.1 and to 1t121dgba of Owl. We 
do not include Buchifier [16] as it is available only as a binary for Windows. 
Also, we did not include nba21dba [26] due to the lack of space and the fact that 
even Seminator 1.1 performs significantly better than nba21dba [5]. 

Recall that Seminator 2 calls Spot's automata simplification routines on con- 
structed automata. To get a fair comparison, we apply these routines also to the 
results of other tools, indicated by +Spot in the results. Further, 1t12ldgba 
of Owl can operate in two modes: --symmetric and --asymmetric. For each 
formula, we run both settings and pick the better result, indicated by +best. 

'Table 2 presents the cumulative results for each semi-determinization tool and 
each benchmark set (we actually merged deterministic and semi-deterministic 
benchmark sets). The timeout of 30 s was reached by Owl for one formula in 
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Table 3. Comparison of tools complementing Büchi automata, using the same con- 
ventions as Table 2. 


deterministic semi-detereministic not semi-deterministic 


literature random literature random literature random 
# of formulas 147-566 . 500-06 . 47-200 . 4994-190 . 154559 — 4864-1406 


ROLL+Spot 1388 (0) 3687 (0)  833(0) 5681 (4) 272 (0) 6225 (58) 
Fribourg+Spot 627 (137) 2493 (464) 290 (26) 3294 (258) 142 (14) 5278 (238) 
GOAL+Spot 617 (143) 2490 (477) 277 (28) 3676 (125) 206 (5) 7713 (96) 
Spot 611 (150) 2477 (489) 190 (40) 2829 (354) 181 (9) 5310 (202) 
Seminator 2 622 (142) 2511 (465) 210 (37) 2781 (420) 169 (8) 4919 (277) 


the (semi-)deterministic category and by Seminator 1.1 for one formula in the 
not semi-deterministic category. Besides timeouts, the running times of all tools 
were always below 3 s, with a few exceptions for Seminator 1.1. 

In the (semi-)deterministic category, the automaton produced by 1t12tgba 
and passed to both versions of Seminator is already semi-deterministic. Hence, 
both versions of Seminator have nothing to do. This category, in fact, compares 
ltl2tgba of Spot against 1t121ldgba of Owl. 

Figure4 shows the distribution of differences between semi-deterministic 
automata produced by Owl+best+Spot and Seminator 2 for the not semi- 
deterministic random set. A dot at coordinates (x,y) represents a formula for 
which Owl and Seminator 2 produced automata with x and y states, respectively. 

We can observe a huge improvement brought by Seminator 2 in not semi- 
deterministic benchmarks: while in 2017 Seminator 1.1 produced a smaller 
automaton than Owl in only few cases in this category [5], Seminator 2 is now 
more than competitive despite the fact that also Owl was improved over the 
time. 


4.2 Complementation 


We compare Seminator 2 with the complementation of ROLL based on 
automata learning (formerly presented as Buechic), the determinization-based 
algorithm [23] implemented in GOAL, the asymptotically optimal Fribourg com- 
plementation implemented as a plugin for GOAL, and with Spot (autfilt 
--complement). We apply the simplifications from Spot to all results and we 
use Spot's 1tl2tgba to create the input Büchi automata for all tools, using 
transition-based generalized acceptance or state-based acceptance as appropri- 
ate (only Seminator 2 and Spot can complement transition-based generalized 
Büchi automata). The timeout of 120 s was reached once by both Seminator 2 
and Spot, 6 times by Fribourg, and 13 times by GOAL and ROLL. 

'Table3 shows results for complementation in the same way as Table2 does 
for semi-determinization. For the deterministic benchmark, we can see quite 
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Fig. 5. Comparison of Seminator 2 against Spot and Fribourg+Spot in terms of the 
sizes (i.e., number of states) of complement automata produced for the not semi- 
deterministic random benchmark. Note that axes are logarithmic. 
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Fig. 6. Running times of complementation tools on the 83 hard cases of the not semi- 
deterministic random benchmark. The running times of each tool on these cases are 
sorted increasingly before being plotted. 


similar results from all tools but ROLL. This is caused by the fact that comple- 
mentation of deterministic automata is easy. Some tools (including Spot) even 
apply a dedicated complementation procedure. It comes at no surprise that the 
specialized algorithm of Seminator 2 performs better than most other comple- 
mentations in the semi-deterministic category. Interestingly, this carries over to 
the not semi-deterministic category. The results demonstrate that the 2-step 
approach of Seminator 2 to complementation performs well in practice. Figure 5 
offers more detailed insight into distribution of automata sizes created by Sem- 
inator 2, Spot, and Fribourg--Spot for random benchmarks in this category. 

Finally, Fig. 6 compares the running times of these tools over the 83 hard 
cases of not semi-deterministic random benchmark (a case is hard if at least 
one tool did not finish in 10 s). We can see that Seminator 2 and Spot run 
significantly faster than the other tools. 
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5 Conclusion 


We have presented Seminator 2, which is a substantially improved version of 
Seminator 1.1. The tool now offers a competitive complementation of TGBA. 
Furthermore, the semi-determinization code was rewritten and offers new opti- 
mizations that significantly reduce the size of produced automata. Finally, new 
user-interfaces enable convenient processing of large automata sets thanks to the 
support of pipelines and batch processing, and versatile applicability in educa- 
tion and research thanks to the integration with Spot’s Python bindings. 
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Abstract. The autonomous control of unmanned aircraft is a highly 
safety-critical domain with great economic potential in a wide range of 
application areas, including logistics, agriculture, civil engineering, and 
disaster recovery. We report on the development of a dynamic moni- 
toring framework for the DLR ARTIS (Autonomous Rotorcraft Testbed 
for Intelligent Systems) family of unmanned aircraft based on the for- 
mal specification language RTLola. RTLola is a stream-based specifica- 
tion language for real-time properties. An RTLola specification of haz- 
ardous situations and system failures is statically analyzed in terms of 
consistency and resource usage and then automatically translated into 
an FPGA-based monitor. Our approach leads to highly efficient, par- 
allelized monitors with formal guarantees on the noninterference of the 
monitor with the normal operation of the autonomous system. 


Keywords: Runtime verification - Stream monitoring - FPGA - 
Autonomous aircraft 


1 Introduction 


An unmanned aerial vehicle, commonly known as a drone, is an aircraft with- 
out a human pilot on board. While usually connected via radio transmissions 
to a base station on the ground, such aircraft are increasingly equipped with 
decision-making capabilities that allow them to autonomously carry out com- 
plex missions in applications such as transport, mapping and surveillance, or crop 
and irrigation monitoring. Despite the obvious safety-criticality of such systems, 
it is impossible to foresee all situations an autonomous aircraft might encounter 
and thus make a safety case purely by analyzing all of the potential behaviors 
in advance. A critical part of the safety engineering of a drone is therefore to 
carefully monitor the actual behavior during the flight, so that the health status 
of the system can be assessed and mitigation procedures (such as a return to the 
base station or an emergency landing) can be initiated when needed. 

© The Author(s) 2020 
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In this paper, we report on the development of a dynamic monitoring frame- 
work for the DLR ARTIS (Autonomous Rotorcraft Testbed for Intelligent Sys- 
tems) family of aircraft based on the formal specification language RTLOLA. 
The development of a monitoring framework for an autonomous aircraft differs 
significantly from a monitoring framework in a more standard setting, such as 
network monitoring. A key consideration is that while the specification language 
needs to be highly expressive, the monitor must operate within strictly limited 
resources, and the monitor itself needs to be highly reliable: any interference 
with the normal operation of the aircraft could have fatal consequences. 

A high level of expressiveness is necessary because the assessment of the 
health status requires complex analyses, including a cross-validation of differ- 
ent sensor modules such as the agreement between the GPS module and the 
accelerometer. This is necessary in order to discover a deterioration of a sensor 
module. At the same time, the expressiveness and the precision of the moni- 
tor must be balanced against the available computing resources. The reliability 
requirement goes beyond pure correctness and robustness of the execution. Most 
importantly, reliability requires that the peak resource consumption of the mon- 
itor in terms of energy, time, and space needs to be known ahead of time. This 
means that it must be possible to compute these resource requirements statically 
based on an analysis of the specification. The determination whether the drone 
is equipped with sufficient hardware can then be made before the flight, and the 
occurrence of dynamic failures such as running out of memory or sudden drops 
in voltage can be ruled out. Finally, the collection of the data from the on-board 
architecture is a non-trivial problem: While the monitor needs access to almost 
the complete system state, the data needs to be retrieved non-intrusively such 
that it does not interfere with the normal system operation. 

Our monitoring approach is based on the formal stream specification lan- 
guage RTLoLA [11]. In an RTLOLA specification, input streams that collect 
data from sensors, networks, etc., are filtered and combined into output streams 
that contain data aggregated from multiple sources and over multiple points in 
time such as over sliding windows of some real-time length. Trigger conditions 
over these output streams then identify critical situations. An RTLOLA specifi- 
cation is translated into a monitor defined in a hardware description language 
and subsequently realized on an FPGA. Before deployment, the specification is 
checked for consistency and the minimal requirements on the FPGA are com- 
puted. The hardware monitor is then placed in a central position where as much 
sensor data as possible can be collected; during the execution, it then extracts 
the relevant information. In addition to requiring no physical changes to the 
system architecture, this integration incurs no further traffic on the bus. 

Our experience has been extremely positive: Our approach leads to highly 
efficient, parallelized monitors with formal guarantees on the non-interference of 
the monitor with the normal operation of the autonomous system. The monitor 
is able to detect violations to complex specifications without intruding into the 
system execution, and operates within narrow resource constraints. RTLOLA is 
cleared for take-off. 
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1.1 Related Work 


Stream-based monitoring approaches focus on an expressive specification lan- 
guage while handling non-binary data. Its roots lie in synchronous, declarative 
stream processing languages like Lustre [13] and Lola [9]. The Copilot framework 
[19] features a declarative data-flow language from which constant space and con- 
stant time C monitors are generated; these guarantees enable usage on an embed- 
ded device. Rather than focusing on data-flow, the family of Lola-languages puts 
an emphasis on statistical measures and has successfully been used to monitor 
synchronous, discrete time properties of autonomous aircraft [1,23]. In contrast 
to that, RTLOLA [12,22] supports real-time capabilities and efficient aggrega- 
tion of data occurring with arbitrary frequency, while forgoing parametrization 
for efficiency [11]. RTLOLA can also be compiled to VHDL and subsequently 
realized on an FPGA [8]. 

Apart from stream-based monitoring, there is a rich body of monitoring 
based on real-time temporal logics [2, 10, 14—16,20] such as Signal Temporal Logic 
(STL) [17]. Such languages are a concise way to describe temporal behaviors 
with the shortcoming that they are usually limited to qualitative statements, 
ie. boolean verdicts. This limitation was addressed for STL [10] by introducing 
a quantitative semantics indicating the robustness of a satisfaction. To specify 
continuous signal patterns, specification languages based on regular expressions 
can be beneficial, e.g. Signal Regular Expressions (SRE) [5]. The R2U2 tool [18] 
stands out in particular as it successfully brought a logic closely related to STL 
onto unmanned aerial systems as an external hardware implementation. 


2 Setup 


The Autonomous Rotorcraft Testbed for Intelligent Systems (ARTIS) is a plat- 
form used by the Institute of Flight Systems of the German Aerospace Center 
(DLR) to conduct research on autonomous flight. It consists of a set of unmanned 
helicopters and fixed-wing aircraft of different sizes which can be used to develop 
new techniques and evaluate them under real-world conditions. 

'The case study presented in this paper revolves around the superARTIS, a 
large helicopter with a maximum payload of 85 kg, depicted in Fig.1. The high 
payload capabilities allow the aircraft to carry multiple sensor systems, com- 
putational resources, and data links. T'his extensive range of avionic equipment 
plays an important role in improving the situational awareness of the aircraft [3] 
during the flight. It facilitates safe autonomous research missions which include 
flying in urban or maritime areas, alone or with other aircraft. Before an actual 
flight test, software- and hardware-in-the-loop simulations, as well as real-time 
logfile replays strengthen confidence in the developed technology. 


2.1 Mission 


One field of application for unmanned aerial vehicles (UAVs) is reconnaissance 
missions. In such missions, the aircraft is expected to operate within a fixed area 
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in which it can cause no harm. The polygonal boundary of this area is called 
a geo-fence. As soon as the vehicle passes the geo-fence, mitigation procedures 
need to be initiated to ensure that the aircraft does not stray further away from 
the safe area. 

The case study presented in this paper features a reconnaissance mission. 
Figure 2 shows the flight path (blue line) within a geo-fence (red line). Evidently, 
the aircraft violates the fence several times temporarily. A reason for this can be 
flawed position estimation: An aircraft estimates its position based on several 
factors such as landmarks detected optically or GPS sensor readings. In the 
latter case, GPS satellites send position and time information to earth. The 
GPS module uses this data to compute the aircraft’s absolute position with 
trilateration. However, signal reflection or a low number of GPS satellites in 
range can result in imprecisions in the position approximation. If the aircraft 
is continuously exposed to imprecise position updates, the error adds up and 
results in a strong deviation from the expected flight path. 

The impact of this effect can be seen in Fig.3. It shows the velocity of a 
ground-borne aircraft in an enclosed backyard according to its GPS module.! 
During the reported period of time, the aircraft was pushed across the backyard 
by hand. While the expected graph is a smooth curve, the actual measurements 
show an erratic curve with errors of up to +1.5ms~!, which can be mainly 
attributed to signals being reflected on the enclosure. The strictly positive trend 
of the horizontal velocity can explain strong deviations from the desired flight 
path seen in Fig. 3. 

A counter-measure to these imprecisions is the cross-validation of several 
redundant sensors. As an example, rather than just relying on the velocity 
reported by a GPS module, its measured velocity can be compared to the inte- 
grated output of an accelerometer. When the values deviate strongly, the values 
can be classified as less reliable than when both sensors agree. 


2.2 Non-Intrusive Instrumentation 


When integrating the monitor into an existing system, the system architecture 
usually cannot be altered drastically. Moreover, the monitor should not interfere 
with the regular execution of the system, e.g. by requiring the controller to send 
explicit messages to it. Such a requirement could offset the timing behavior and 
thus have a negative impact on the overall performance of the system. 

The issue can be circumvented by placing the monitor at a point where it can 
access all data necessary for the monitoring process non-intrusively. In the case 
of the superARTIS, the logger interface provides such a place as it compiled 
the data of all position-related sensors as well as the output of the position 
estimation [3,4]. Figure 4 outlines the relevant data lines of the aircraft. Sensors 
were polled with fixed frequencies of up to 100 Hz. The schematic shows that the 
logger explicitly sends data to the monitor. This is not a strict requirement of 


1 GPS modules only provide absolute position information; the first derivative thereof, 
however, is the velocity. 
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longitude 


latitude 


Fig.1. DLR’s autonomous superAR- Fig.2. Reconnaissance mission for a 

TIS equipped with optical navigation. UAV. The thin blue line represents its 
trajectory, the thick red line a geo- 
fence. 


the monitor as it could be connected to the data buses leading to the logger and 
passively read incoming data packets. However, in the present setting, the logger 
did not run at full capacity. Thus sending information to the monitor came at 
no relevant cost while requiring few hardware changes to the bus layout. 

In turn, the monitor provides feedback regarding violations of the specifica- 
tion. Here, we distinguish between different timing behaviors of triggers. The 
monitor evaluates event-based triggers whenever the system passes new events 
to the monitor and immediately replies with the results. For periodic triggers, 
i.e. , those annotated with an evaluation frequency, the evaluation is decoupled 
from the communication between monitor and system. Thus, the monitor needs 
to wait until it receives another event until reporting the verdict. T'his incurs a 
short delay between detection and report. 


2.3 StreamLAB 


STREAMLAB? [11] is a monitoring framework revolving around the stream- 
based specification language RTLOLA. It emphasizes on analyses conducted 
before deployment of the monitor. This increases the confidence in a successful 
execution by providing information to aid the specifier. To this end, it detects 
inconsistencies in the specification such as type errors, e.g. an lossy conversion 
of a floating point number to an integer, or timing errors, e.g. accessing values 
that might not exist. Further, it provides two execution modes: an interpreter 
and an FPGA compilation. The interpreter allows the specifier to validate their 
specification. For this, it requires a trace, i.e. a series of data that is expected to 
occur during an execution of the system. It then checks whether a trace complies 
with the specification and reports the points in time when specified bounds are 
violated. After successfully validating the specification, it can be compiled into 
VHDL code. Yet again, the compiled code can be analyzed with respect to the 
space and power consumption. This information allows for evaluating whether 
the available hardware suffices for running the RTLOLA monitor. 


? www.stream-lab.eu. 
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An RTLOLA specification consists of input and output streams, as well 
as trigger conditions. Input streams describe data the system produces asyn- 
chronously and provides to the monitor. Output streams use this data to assess 
the health state of the system e.g. by computing statistical information. Trig- 
ger conditions distinguish desired and undesired behavior. A violation of the 
condition issues an alarm to the system. 

The following specification declares a floating point input stream height 
representing sensor readings of an altimeter. The output stream avg height 
computes the average value of the height stream over two minutes. The aggre- 
gation is a sliding window computed once per second, as indicated with the 
@1Hz annotation. The stream óheight computes the difference between the 
average and the current height. A strong deviation of these values constitutes a 
suspicious jump in sensor readings, which might indicate a faulty sensor or an 
unexpected loss or gain in height. In this case, the trigger in the specification 
issues a warning to the system, which can initiate mitigation measures. 


input height: Float32 

output avg height O1Hz :- height.aggregate(over: 2min, using: avg) 
output dheight := abs(avg height.holdO .defaults(to: height) - height) 
trigger Óheight > 50.0 "WARNING: Suspicious jump in height." 


Note that this is just a brief introduction to RTLOLA and the STREAMLAB 
framework. For more details, the authors refer to [8, 11,12, 22]. 


2.4 FPGA as Monitoring Platform 


An RTLOLA specification can be compiled into the hardware description lan- 
guage VHDL and subsequently realized on an FPGA as proposed by Baumeis- 
ter et al. [8]. An FPGA as target platform for the monitor has several advantages 


3 Details on how such a computation can cope with a statically-bounded amount of 
memory can be found in [12,22]. 
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in terms of improving the development process, reducing its cost, and increasing 
the overall confidence in the execution. 

Since the FPGA is a separate module and thus decoupled from the con- 
trol software, these components do not share processor time or memory. This 
especially means that control and monitoring computations happen in parallel. 
Further, the monitor itself parallelizes the computation of independent RTLOLA 
output streams with almost no additional overhead. This significantly acceler- 
ates the monitoring process [8]. The compiled VHDL specification allows for 
extensive static analyses. Most notably, the results include whether the board 
is sufficiently large in terms of look-up tables and storage capabilities to host 
the monitor, and the power consumption when idle or at peak performance. 
Lastly, an FPGA is the sweet spot between generality and specificity: it runs 
faster, is lighter, and consumes less energy than general purpose hardware while 
retaining a similar time-to-deployment. The latter combined with a drastically 
lower cost renders the FPGA superior to application-specific integrated circuits 
(ASIC) during development phase. After that, when the specification is fixed, 
an ASIC might be considered for its yet increased performance. 


2.5 RTLola Specifications 


The entire specification for the mission is comprised of three sub-specifications. 
This section briefly outlines each of them and explains representative proper- 
ties in Fig. 5. The complete specifications as well as a detailed description were 
presented in earlier work [6,21] and the technical report of this paper [7]. 


Sensor Validation. Sensors can produce incorrect values, e.g. when too few 
GPS satellites are in range for an accurate trilateration or if the aircraft flies 
above the range of a radio altimeter. A simple exemplary validation is to 
check whether the measured altitude is non-negative. If such a check fails, 
the values are meaningless, so the system should not take them into account 
in its computations. 

Geo-Fence. During the mission, the aircraft has permission to fly inside a zone 
delimited by a polygon, called a geo-fence. The specification checks whether 
a face of the fence has been crossed, in which case the aircraft needs to 
ensure that it does not stray further from the permitted zone. 

Sensor Cross-Validation. Sensor redundancy allows for validating a sensor 
reading by comparing it against readings of other sensors. An agreement 
between the values raises the confidence in their correctness. An example is 
the cross-validation of the GPS module against the accelerometer. Integrat- 
ing the readings of the latter twice yields an absolute position which can be 
compared against the GPS position. 


Figure5 points out some representative sub-properties of the previously 
described specification in RTLOLA, which are too long to discuss them in detail. 
It contains a validation of GPS readings as well as a cross-validation of the GPS 
module against the Inertial Measurement Unit (IMU). The specification declares 
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input gps_x: Float16 // Absolute x positive from GPS module 

input num.sat : UInt8 // Number of GPS satellites in range 

input imu. acc.x: Float32 // Acceleration in x direction from IMU 

// Check if the GPS module emitted few readings in the last 3s. 

trigger @1Hz gps x.aggregate(over: 3s, using: count) < 10 

"VIOLATION: Few GPS updates ” 

// 1 if there are few GPS Satellites in range, otherwise 0. 

output few. sat: UInt8 := Int(num. sat < 9) 

// Check if there rarely were enough GPS satellites in range. 

trigger @1Hz few sat.aggregate(over: 5s, using: X) > 12 "WARNING: 
Unreliable GPS data." 

// Integrate acceleration twice to obtain absolute position. 

output imu. vel xQ1Hz :— imu acc.x.aggregate(over: oo, using: f) 

output imu. x G1Hz := imu_vel_x.aggregate(over: oo, using: f) 

// Issue an alarm if readings from GPS and IMU disagree. 

trigger abs(imu x — gps x) > 0.5 "VIOLATION: GPS and IMU readings 
deviate." 


Fig.5. An RTLOLA specification validating GPS sensor data and cross validating 
readings from the GPS module and IMU. 


three input streams, the x-position and number of GPS satellites in range from 
the GPS module, and the acceleration in x-direction according to the IMU. 

The first trigger counts the number of updates received from the GPS module 
by counting how often the input stream gps. x gets updated to validate the 
timing behavior of the module. 

The output stream few sat computes the indicator function for 
num sat < 9, which indicates that the GPS module might report unreliable 
data due to few satellites in reach. If this happens more than 12 times within 
five seconds, the next trigger issues a warning to indicate that the incoming GPS 
values might be inaccurate. T'he last trigger checks whether the double integral 
of the IMU acceleration coincides with the GPS position up to a threshold of 
0.5 m. 


2.6 VHDL Synthesis 


'The specifications mentioned above were compiled into VHDL and realized on 
the Xilinx ZC702 Base Board^. The following table details the resource con- 
sumption of each sub-specification reported by the synthesis tool Vivado. The 
number of flip-flops (FF) indicates the memory consumption in bits; neither 
specification requires more than 600B of memory. The number of LUTs (Look- 
up Tables) is an indicator for the complexity of the logic. The sensor validation, 
despite being significantly longer than the cross-validation, requires the least 


* https: //www.xilinx.com/support /documentation /boards. and. kits/zc702. zvik/ug8 
50-zc702-eval-bd.pdf. 
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Spec FF | FF[%] LUT |LUT[%] | MUX Idle [mW] Peak [W] 
Geo-fence 2,853 3 26,181 | 71 4 |149 1.871 
Validation | 4,792 | 5 34,630 | 67 104 |156 2.085 
Cross 3,441 | 4 23,261 | 46 99 |150 1.911 


amount of LUTs. The reason is that its computations are simple in compari- 
son: Rather than computing sliding window aggregations or line intersections, 
it mainly consists of simple thresholding. The number of multiplexers (MUX) 
reflects this as well: Since thresholding requires comparisons, which translate to 
multiplexers, the validation requires twice as many of them. Lastly, the power 
consumption of the monitor is extremely low: When idle, neither specification 
requires more than 156mW and even under peak pressure, the power consump- 
tion does not exceed 2.1W. For comparison, a Raspberry Pi needs between 1.1W 
(Model 2B) and 2.7W (Model 4B) when idle and roughly twice as much under 
peak pressure, i.e., 2.1W and 6.4W, respectively.? 

Note that the geo-fence specification checks for 12 intersections in parallel, 
one for each face of the fence (cf. Fig. 2). Adapting the number of faces allows 
for scaling the amount of FPGA resources required, as can be seen in Fig. 6a. 
The graph does not grow linearly because the realization problem of VHDL 
code onto an FPGA is a multi-dimensional optimization problem with several 
pareto-optimal solutions. Under default settings, the optimizer found a solution 
for four faces that required fewer LUTs than for three faces. At the same time, 
the worst negative slack time (WNST) of the four-face solution was lower than 
the WNST for the three-face solution as well (cf. Fig. 6b), indicating that the 
former performs worst in terms of running time. 


3 Results 


As the title of the paper suggests, the superARTIS with the RTLOLA monitor 
component is cleared to fly and a flight test is already scheduled. In the mean- 
time, the monitor was validated on log files from past missions of the superARTIS 
replayed under realistic conditions. During a flight, the controller polls samples 
from sensors, estimates the current position, and sends the respective data to the 
logger and monitor. In the replay setting, the process remains the same except 
for one detail: Rather than receiving data from the actual sensors, the data sent 
to the controller is read from a past log file in the same frequency in which they 
were recorded. The timing and logging behavior is equivalent to a real execution. 
'This especially means that the replayed data points will be recorded again in 
the same way. Control computations take place on a machine identical to the 
one on the actual aircraft. As a result, from the point of view of the monitor, 
the replay mode and the actual flight are indistinguishable. Note that the setup 


5 Information collected from https://www.pidramble.com/wiki/benchmarks/power- 
consumption in January, 2020. 
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Fig. 6. Result of the static analysis for different amounts of face of the geo-fence. 


is open-loop, i.e. , the monitor cannot influence the running system. Therefore, 
the replay mode using real data is more realistic than a high-fidelity simulation. 

When monitoring the geo-fence of the reconnaissance mission in Fig.2, all 
twelve face crossings were detected successfully. Additionally, when replaying 
the sensor data of the experiment in the enclosed backyard from Sect. 2.1, the 
erratic GPS sensor data lead to 113 violations regarding the GPS module on 
its own. Note that many of these violations point to the same culprit: a low 
number of available GPS satellites, for example, correlates with the occurrence 
of peaks in the GPS velocity. Moreover, the cross validation issued another 36 
alarms due to a divergence of IMU and GPS readings. Other checks, for example 
detecting a deterioration of the GPS module based on its output frequency, were 
not violated in either flight and thus not reported. 


4 Conclusion 


We have presented the integration of a hardware-based monitor into the super- 
ARTIS UAV. The distinguishing features of our approach are the high level of 
expressiveness of the RTLOLA specification language combined with the formal 
guarantees on the resource usage. The comprehensive tool framework facilitates 
the development of complex specifications, which can be validated on log data 
before they get translated into a hardware-based monitor. The automatic anal- 
ysis of the specification derives the minimal requirements on the development 
board needed for safe operation. If they are met, the specification is realized 
on an FPGA and integrated into the superARTIS architecture. Our experience 
shows that the overall system works correctly and reliably, even without thor- 
ough system-level testing. This is due to the non-interfering instrumentation, 
the validated specification, and the formal guarantees on the absence of dynamic 
failures of the monitor. 
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Abstract. We study the expressiveness and reactive synthesis prob- 
lem of HyperQPTL, a logic that specifies w-regular hyperproperties. 
HyperQPTL is an extension of linear-time temporal logic (LTL) with 
explicit trace and propositional quantification and therefore truly com- 
bines trace relations and w-regularity. As such, HyperQPTL can express 
promptness, which states that there is a common bound on the num- 
ber of steps up to which an event must have happened. We demonstrate 
how the HyperQPTL formulation of promptness differs from the type of 
promptness expressible in the logic Prompt-LTL. Furthermore, we study 
the realizability problem of HyperQPTL by identifying decidable frag- 
ments, where one decidable fragment contains formulas for promptness. 
We show that, in contrast to the satisfiability problem of HyperQPTL, 
propositional quantification has an immediate impact on the decidability 
of the realizability problem. We present a reduction to the realizability 
problem of HyperLTL, which immediately yields a bounded synthesis 
procedure. We implemented the synthesis procedure for HyperQPTL in 
the bounded synthesis tool BoSy. Our experimental results show that a 
range of arbiter satisfying promptness can be synthesized. 


1 Introduction 


Hyperproperties [5], which are mainly studied in the area of secure information 
flow control, are a generalization from trace properties to sets of trace proper- 
ties. That is, they relate multiple execution traces with each other. Examples are 
noninterference [20], observational determinism [34], symmetry [16], or prompt- 
ness [24], i.e., properties whose satisfaction cannot be determined by analyzing 
each execution trace in isolation. 

A number of logics have been introduced to express hyperproperties (exam- 
ples are [4,19,25]). They either add explicit trace quantification to a temporal 
logic or build on monadic first-order or second-order logics and add an equal- 
level predicate, which connects traces with each other. A comprehensive study 
comparing such hyperlogics has been initiated in [6]. 
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The most prominent hyperlogic is HyperLTL [4], which extends classic 
linear-time temporal logic (LTL) [26] with trace variables and explicit trace 
quantification. HyperLTL has been successfully applied in (runtime) verifica- 
tion, (e.g., [15,21,32]), specification analysis [11,14], synthesis [12,13], and pro- 
gram repair [1] of hyperproperties. As an example specification, the following 
HyperLTL formula expresses observational determinism by stating that for every 
pair of traces, if the observable inputs J are the same on both traces, then also 
the observable outputs O have to agree 


Va Vv DUn = Ix) > OO, = On) . (1) 


'Thus, hyperlogics can not only specify functional correctness, but may also 
enforce the absence of information leaks or presence of information propa- 
gation. There is a great practical interest in information flow control, which 
makes synthesizing implementations that satisfy hyperproperties highly desir- 
able. Recently [12], it was shown that the synthesis problem of HyperLTL, 
although undecidable in general, remains decidable for many fragments, such as 
the 4*V fragment. Furthermore, a bounded synthesis procedure was developed, 
for which a prototype implementation based on BoSy [7,9,12] showed promising 
results. 

HyperLTL is, however, intrinsically limited in expressiveness. For example, 
promptness is not expressible in HyperLTL. Promptness is a property stating 
that there is a bound b, common for all traces, on the number of steps up to 
which an event e must have happened. Additionally, just like LTL, HyperLTL can 
express neither w-regular nor epistemic properties [2,29]. Epistemic properties 
are statements about the transfer of knowledge between several components. 
An exemplary epistemic specification is described by the dining cryptographers 
problem [3]: three cryptographers sit at a table in a restaurant. Either one of the 
cryptographers or, alternatively, the NSA must pay for their meal. The question 
is whether there is a protocol where each cryptographer can find out whether the 
NSA or one of the cryptographers paid the bill, without revealing the identity 
of the paying cryptographer. 

In this paper, we explore HyperQPTL [6,29], a hyperlogic that is more 
expressive than HyperLTL. Specifically, we study its expressiveness and reac- 
tive synthesis problem. HyperQP TL extends HyperLTL with quantification over 
sequences of new propositions. What makes the logic particularly expressive is 
the fact that the trace quantifiers and propositional quantifiers can be freely 
interleaved. With this mechanism, HyperQPTL can not only express all w- 
regular properties over a sequences of n-tuples; it truly interweaves trace quantifi- 
cation and w-regularity. For example, promptness can be stated as the following 
HyperQPTL formula: 


Jb.. Gb ^(-b Ue). (2) 


The formula states that there exists a sequence s € (214) )^, such that event e 
holds on all traces before the first occurrence of b in s. In this paper, we argue 
that the type of promptness expressible in HyperQPTL is incomparable to the 
expressiveness of Prompt-LTL [24], a logic introduced to express promptness 
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multiple universal 
trace quantifiers linear V7 Q7 non-linear V7. 
(Sec. 4.3) 


1 

single universal trace 3 3 
quantifier (Sec. 4.2) 
| 


no universal trace 
quantifier (Sec. 4.1) 


Fig. 1. The realizability problem of HyperQPTL. Left and below of the solid line are 
the decidable fragments, right above the solid line the undecidable fragments. 


properties. It is further known that HyperQPTL also subsumes epistemic exten- 
sions of temporal logics such as LTL [22], as well as the first-order hyperlogic 
FO[«, E] [6,19,29]. Its expressiveness makes HyperQPTL particularly interest- 
ing. The model checking problem of HyperQPTL is, despite the logic being 
quite expressive, decidable [29]. We also explore an alternative definition of 
HyperQPTL that would result in an even more expressive logic. However, we 
show that the logic would have an undecidable model checking problem, which 
constitutes a major drawback in the context of computer-aided verification. Fur- 
thermore, satisfiability is decidable for large fragments of the logic [6]. Decidable 
HyperQPTL fragments can be described solely in terms of their trace quantifier 
prefix. This indicates that propositional quantification has no negative impact 
on the decidability, although it greatly increases the expressiveness. We establish 
that propositional quantification, in contrast to the satisfiability problem, has 
an impact on the realizability problem: it becomes undecidable when combining 
a propositional V4 quantifier alternation with a single universal trace quantifier. 
However, we show that the synthesis problem of large HyperQPTL fragments 
remains decidable, where one of these fragments contains promptness proper- 
ties. We partially obtain these results by reducing the HyperQPTL realizability 
problem to the HyperLTL realizability problem. Based on this reduction, we 
extended the BoSy bounded synthesis tool to also synthesize systems respecting 
HyperQPTL specifications. We provide promising experimental results of our 
prototype implementation: using BoSy and HyperQPTL specifications, we were 
able to synthesize arbiters that respect promptness. 

This paper is structured as follows. In Sect. 2, we give necessary preliminaries. 
In Sect.3, we define HyperQPTL. We discuss an alternative approach to define 
a logic expressing w-regular hyperproperties, before pointing out that its model 
checking problem is undecidable. Subsequently, we give examples for the expres- 
siveness of HyperQPTL, namely by characterizing the type of promptness prop- 
erties HyperQPTL can express. Additionally, we recapitulate how HyperQPTL 
also subsumes epistemic properties. Section 4 discusses the realizability problem 
of HyperQPTL. We describe HyperQPTL fragments in terms of their quanti- 
fier prefixes. To present our results, we use the following notation. We write V; 
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and V, for a single universal trace and propositional quantifier, respectively. To 
denote a sequence of universal trace and propositional quantifiers, we write V7. 
and V7. Furthermore, we use V7 du for a sequence of mixed universal quantifica- 
tion. We use the analogous notation for existential quantifiers. Lastly, Q% and 
Qj denote a sequence of mixed universal and existential trace and propositional 
quantifiers, respectively. As an example, the VQ; fragment denotes all formulas 
of the form Vm... Vra. 3/Vqi.... 3/Vqn. p, where q is quantifier free. Figure 1 
summarizes our results. We establish that a major factor for the decidability of 
the realizability problem consists in the number of universal trace occurring in a 
formula. Realizability of HyperQP TL formulas without Va quantifiers is decid- 
able (Sect. 4.1). Formulas with a single Yr are decidable if they belong to the 
m ain aV 4 Q; fragment. This fragment also contains promptness. For more than 
one iniversal trace quantifier, we show that decidability can be guaranteed for a 
fragment that we call the linear V7 Q7 fragment. We also show that all the above 
fragments are tight, i.e., realizability of all other formulas is in general undecid- 
able. Lastly, Sect.5 presents experiments for the prototype implementation of 
our bounded synthesis algorithm for HyperQPTL. 


2 Preliminaries 


We use AP for a set of atomic propositions. A trace over AP is an infinite 
sequence t € (2^P)^. For i € N, we write t[i] for the ith element of t and t[i, oc] 
for the suffix of t starting from position i. For two traces t,t’ over AP and a set 
AP’ C AP, we write t = apt’ to indicate that t and t’ agree on all a € AP’, and 
respectively T = Ap: T" for two sets of traces T and T’. Furthermore, we define 
a replacement function t[q — tq] that given a trace t and a trace t, € (2143), 
replaces the occurrences of q in t according to tg, such that t[q > tq] = (qytq 
and t[q +> tg] = ap\{q}t. We also lift this notation to sets of traces and define 
Tla ta] = (tla > ty] |t € TH. 

QPTL [31] extends Linear Temporal Logic (LTL) with quantification over 
propositions. QPTL formulas y are defined as follows. 


q.p | Yap | v 
a ee 


where q € AP and AP is a set of atomic propositions. For simplicity, we assume 
that variable names in formulas are cleared of double occurrences. The semantics 
of y over AP is defined with respect to a trace t € (24P)¥. 
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tq iff q € t[0] 

iu iff tee 

t E Yı V v2 iff t E yı or t E p2 

tE Ow iff t[1, o0] Ew 

tb ov iff 3i > 0. tli, oc] Eo 

t E39 iff 3t, € (212)". tq > t] Ey 
t = Yq. p iff Vt, € (242). tiq 5 t4] E v 


We did not define the until operator U as native part of the logic. It can be 
derived using propositional quantification [23]. The boolean connectives ^, >, 
and the temporal operators globally [] and release R are derived as usually. 


3 w-Regular Hyperproperties 


Just like LTL, HyperLTL cannot express w-regular languages [29]. LTL can 
be extended to QPTL by adding quantification over atomic propositions. In 
QPTL, w-regular languages become expressible. We therefore study HyperQPTL 
[6,29], the extension of HyperLTL with propositional quantification, to express 
w-regular hyperproperties. Given a set AP of atomic propositions and a set V of 
trace variables, the syntax of HyperQPTL is defined as follows 


p = Yr. p | IT. o | Ya. y | 3a. 0| v 
vz-as|a|^v|vvv|ov|Oov. 


where a,q € AP and m € V. As for QPTL, we assume that formulas are 
cleared of double occurrences of variable names. We require that in well-defined 
HyperQPTL formulas, each a, is in the scope of a trace quantifier binding 7 
and each q is in the scope of a propositional quantifier binding q. Note that 
atomic propositions a, refer to a quantified trace 7, whereas quantified propo- 
sitional variables q are independent of the traces. The semantics of a well- 
defined HyperQPTL formula over AP is defined with respect to a set of traces 
T C (2^P)" and an assignment function II : V — T. We define the satisfaction 
relation I,i r q as follows: 
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I i Ep ar iff a € I (x)[i] 

Tier q iff Vt € T.q € tli] 

IM, i Ep my iff I,i Er vj 

I, i Er V1 V V» iff I, i E i V H, i E p 

I,i Hr oy iff Hi+i Hry 

Ili |r Oy iff Jj >i. I. j Hr y 

Il,i Ep Ir. € iff Jt € T. H[r 5 t], i Hr p 

Il,i Hr Yr. p iff vt € T. Il|x = t], i mr v 

II,i Er 39. iff 3t, e (2U9)". I, i Hrita] P 
I,i Ep Yq. e iff Vt, € (2419. IT, i F7T[qt4]  - 


Note that the semantics of propositional quantification is defined in such a way 
that in the scope of a quantifier binding q, all traces agree on their q-sequence. We 
say that a set of traces T satisfies a HyperQPTL formula q if 0,0 r y, where Ø 
is the empty trace assignment. QPTL formulas can be expressed in HyperQPTL 
using a single universal trace quantifier. Furthermore, HyperLTL [4] is the syn- 
tactic subset of HyperQP TL that does not contain propositional quantification. 

While HyperQPTL can express a wide range of properties (see Sect. 3.1), 
its model checking problem is still decidable [29]. Furthermore, the syntactic 
fragments for which satisfiability is decidable can be expressed solely in terms 
of the occurring trace quantifiers: Just like for HyperLIL, satisfiability of a 
HyperQPTL formula is decidable if no V7 is followed by an 37 [6]. 

The definition of HyperQPTL is straightforward, however, one could argue 
that it is not the only way to extend QPTL to a hyperlogic. The original idea 
of QPTL is to “color” the trace by introducing additional atomic propositions. 
The way HyperQPTL is defined, that idea is translated to sets of traces by 
coloring the traces uniformly. An alternative approach could be to color every 
trace individually by introducing a full atomic proposition for every proposi- 
tional quantification. This resembles full second-order quantification and would 
therefore result in a considerably more expressive logic. In particular, we show 
that the model checking problem would become undecidable, which is, especially 
in the context of automatic verification, unfavorable. For the remainder of this 
section, we call the logic resulting from the alternative definition HyperQPTL*. 
The syntax of HyperQPTL* is similar to the one of HyperQPTL, just without 
the rule q for the evaluation of the propositional variables. This accounts for 
the idea that the propositional quantification can freely reassign atomic propo- 
sitions; thus, there is no need to distinguish between free atomic propositions 
and quantified atomic propositions: 


p z— Yr. y | Ir. y | Ya. y | 3a. p | Y 
p= ar| w| yvy oy] oy. 
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Semantically, only the rules for the quantification of the propositional quantifiers 
change: 


II,i Eg 3a. y iff ST’ c (22?)». T' Zap a] T ^ H,i Hr g 
II, i Eq Va. iff YT” C (249 T Spar T > M, i Hr g . 


Lemma 1. The HyperQPTLI* model checking problem is undecidable. 


Proof. Given a finite Kripke structure K and a HyperQPTL* formula y, the 
model checking problem asks whether the trace set T produced by K satisfies 
y. The proof follows the undecidability proof for the model checking problem 
of SIS[E] [6], a logic which lifts SIS to the level of hyperlogics. We describe 
a reduction from the halting problem of 2-counter machines (which are Turing 
complete) to the HyperQPTL* model checking problem. A 2-counter machine 
(2CM) consists of a finite set of serially numbered instructions that modify 
two counters. A configuration of a 2CM is a triple (n,v1,v2) € NÌ, where n 
determines the next instruction to be executed, and vı and v2 assign the counter 
values. Each instruction can either increase or decrease one of the counters; or 
test either of the counters for zero and, depending on the outcome, jump to 
another instruction. Furthermore, we assume a special instruction iha, which 
indicates that the machine has reached a halting state. A 2CM halts from initial 
configuration sg if there is a finite sequence so,...,5s, of configurations such 
that sn is a halting configuration and s;,4 is a result of applying the instruction 
in s; to configuration s;. Let M be a 2CM. We describe T and y such that 
T = ọ iff M halts. We choose AP = {i,ci,co} and T is the set of all traces 
where each atomic proposition holds exactly once. That way, a trace t encodes 
a configuration of the machine: If i € t[n], c1 € t[vi], and c2 € t[v2], the machine 
is in configuration (n, v4, v). It is easy to see that T can be produced by a finite 
Kripke structure. To describe y, we make two helpful observations. First, using 
propositional quantification, we can quantify a trace set T, C T: a trace t is in 
T; iff the quantified proposition q eventually occurs on t. Second, for two traces 
t,t’ € T, we can state that t encodes a configuration which is the successor of 
the configuration encoded by t. Using these observations, we define y = 3q. v, 
where q encodes a set T, C T that is supposed to describe a halting computation. 
To ensure that T, describes a halting computation, y’ is a conjunction of the 
following requirements: 7; must 


1. be finite, 

2. contain a halting configuration and the initial configuration, 

3. be predecessor closed with respect to the encoded configurations it contains 
(except for the initial configuration). 


Finiteness of T} can be expressed by stating that there is an upper bound on the 
values of i,c,, and cz on the traces in T4. With the observations made before, 
stating the above requirements in HyperQPTL* now remains a straightforward 
exercise. 
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Since the model checking problem of HyperQPTL* is undecidable, we focus 
on HyperQPTL to express w-regular hyperproperties. In particular, we show 
that HyperQPTL can express a range of relevant properties that are neither 
expressible in HyperLTL, nor in QPTL. 


3.1 The Expressiveness of HyperQPTL 


HyperQPTL combines trace quantification with w-regularity. The interplay 
between the two features enables HyperQPTL to express a variety of proper- 
ties. In Sect. 1, we showed how HyperQPTL can express a form of promptness. 
In this section, we further elaborate on the type of properties HyperQPTL can 
express. In particular, we compare it to Prompt-LTL, a logic that extends LTL 
with bounded eventualities. Furthermore, HyperQPTL is also able to express 
epistemic properties by emulating the knowledge operator known from LTLx. 

A straightforward class of properties HyperQPTL can express are w-regular 
properties over n-tuples of quantified traces. Formulas expressing this type of 
properties first have a trace quantifier prefix followed by a QPTL formula, i.e., 
they lie in the Q7.Q7 fragment. This fragment of HyperQPTL corresponds to the 
extension of QPTL with prenez trace quantification. However, the true expres- 
sive power of HyperQPTL originates from the fact that we allow the trace quan- 
tifiers and propositional quantifiers to alternate. 


Promptness Properties. Promptness properties are an example for HyperQPTL’s 
interplay between trace quantification and propositional quantification. Prompt- 
ness expresses that eventualities are fulfilled within a bounded number of steps. 
One way to express promptness properties is the logic Prompt-LTL, which 
extends LTL with the promptness operator Op. A system satisfies a Prompt- 
LTL formula y if there is a bound k such that all traces of the system fulfill the 
formula where each ©, in ¢ is replaced by QS", i.e., the system must fulfill all 
prompt eventualities within k steps. For example, y = OQ, % holds in a system 
if there is a bound k such that all traces of the system at all times satisfy wv 
within k steps. HyperQP TL can express a different type of promptness proper- 
ties. In Sect. 1, Formula 2, we showed how one can state in HyperQPTL that 
there is a bound, common for all traces, until which an eventuality has to be 
fulfilled. The idea is to quantify a new proposition b, such that the first position 
in which b is true serves as the bound. Compared to Prompt-LTL, HyperQPTL 
thus expresses a weaker form of promptness, while still being stronger than pure 
eventuality. This type of promptness only becomes meaningful when comparing 
several traces of the system: HyperQPTL can enforce that there is a common 
bound for all traces (the system cannot starve), but it does not make the bound 
explicit. The following example shows a more involved promptness property 
expressible in HyperQP TL. 


Example 1. HyperQPTL can express bounded waiting for a grant. It states that 
if the system requests access to a shared resource at point in time t, then it will 
be granted access within a bounded amount of time. The bound may depend on 
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the point in time t where access to the resource was requested. However, it may 
not depend on the current trace. We express this property in HyperQP TL as 
follows, also adding that the system will not request access twice without being 
granted access in between. 


Vr. Ors = O(^rs Ww d«)) (1) 


Yr. db. Vi (rg ^ Tat > O(Q b^ (^b U gr) ^ (^b U gr))) (2) 


Formula 1 states that no second request is posed before being given a grant. 
Formula 2 expresses the bounded waiting property by universally quantifying a 
trace, then existentially quantifying a sequence of bounds b. Now, for every trace 
a’, whenever 7 and 7' pose a request at the same point in time, both have to get 
access to the resource before b holds next. Therefore, for each point in time, there 
is a bound such that all traces posing a request at that point in time get access 
within a bounded number of steps. Note that this property differs from saying 
“all traces are eventually granted access", where the bound may also depend on 
the trace under consideration. In this scenario, each of the infinitely many traces 
could wait arbitrarily long for the grant. In particular, it could happen that with 
each trace the waiting time is longer than before. 


The above example shows how the interplay of trace quantifiers and proposi- 
tional quantifiers can be leveraged to express a new class of promptness proper- 
ties. We finally note that compared to Prompt-LTL, HyperQPTL cannot express 
that all eventualities must be fulfilled within a fixed k number of steps. 


Corollary 1. The expressiveness of HyperQPTL and Prompt-LTL is incompa- 
rable. 


Epistemic Properties. Another interesting class of properties that are not 
expressible in HyperLTL are epistemic properties. Epistemic properties describe 
the knowledge of agents that interact with each other in a system. Logics that 
express epistemic properties are often equipped with a so-called knowledge oper- 
ator, e.g., LT Lx, which is LTL extended with the knowledge operator Ka y. The 
operator denotes that an agent A C AP knows y. An agent A is characterized in 
terms of the atomic propositions he can observe. The semantics of the operator 
is described with the following rule 


tieKay iff vi.t(oj =4 t,i] 5 tib ge. 


The formula is evaluated with respect to a trace t and a position i. We omit 
the semantic definition for the rest of the logic, which corresponds to plain LTL. 
The semantic definition of the operator captures the idea that an agent knows 
some fact y if y holds on all traces that are indistinguishable for the agent. 


Example 2 (Dining Cryptographers). The dining cryptographers problem [3] is 
an interesting example of how epistemic properties can characterize non-trivial 
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Fig. 2. The dining cryptographers problem with three cryptographers. 


protocols. The problem describes the following situation (see Fig. 2): three cryp- 
tographers C4, Co, and C3 sit at a table in a restaurant and either one of cryp- 
tographers or, alternatively, the NSA paid for their meal. The task for the cryp- 
tographers is to figure out whether the NSA or one of the cryptographers paid. 
However, if one of the cryptographers paid, then the others must not be able 
to infer who it was. Each cryptographer C; receives several bits of information: 
paid; indicating whether or not he pays the bill, and two secrets, each shared 
with one of the other cryptographers. The secrets can be used to encode the 
information they share as output out;. By combining the outputs of all cryptog- 
raphers, it must become clear whether the NSA or one of the group paid. The 
specification of the protocol can be easily formalized in LT Ly. The following 
formula describes the desired behavior of agent Ci: 


DC agent1 = 
(paid group ^ ^paid, — (Ko, (paid, V paid3) ^ ^ Ko, paid, ^ ^ Ko, paids)) 
^ (paid yg, — Ko, (paid, ^-paid, ^-paid4)) . 
The knowledge operator can also be defined for hyperlogics [29]. It receives an 


additional parameter 7, indicating the trace the knowledge refers to. When added 
to HyperQPTL, it has the following semantics: 


Lier KA if Vt eT.Il(x)(0,4] =4 [0,4] > E[r = t], i Hr o . 


The knowledge operator, however, can be encoded in HyperQPTL using propo- 
sitional quantification. Epistemic problems, such as the dining cryptographers 
problem, can thus be expressed in HyperQPTL. 


Theorem 1 (/29]). HyperQPTL can emulate the knowledge operator. 
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Proof. We recap the proof from [29]: Let o = Qr/q.--Qrx/q-¢’ be a HyperQPTL 
formula, equipped with the knowledge operator as defined above. We assume 
that ọ is given in negated normal form, i.e. each XCA,» occurs either in positive 
position or in negated form. Let u and t be fresh propositions and let z’ be a fresh 
trace variable. Recursively, we replace each knowledge operator K4,; occurring 
in ọ in positive position with the following formula 


Qa c Qsig us Vr Va. qr IKE aga esu] A 
((r U (u^r ^OU^r)) AU(r > Ar = An) > ü(r ^O^r  v[x e «'])) 


and each K 4.4 occurring negatively with the following formula 


Quia is Qajg St. Vr. Sm. @ E ar ul A 
((r U (uArAOO-r)) > O(r > Ar = Ar) ^B(r ^Oor > ovn = 7], 


where we use q'[K A, +> u] to denote that in y’, a single occurrence of the 
knowledge operator is replaced by u, and v[r — -'] to denote the formula 
where m is replaced by x’. The existentially quantified proposition u indicates 
the points in time where the knowledge operator is supposed to hold/not hold. 
The universally quantified proposition r is assumed to change once from r to ^r 
and thereby point at one of the points in time picked by u. It is then used to 
compare the prefix of the old trace m and an alternative trace quantified by the 
trace variable z^. 


4 HyperQPTL Realizability 


In reactive synthesis, the task is, given a specification y, to construct a sys- 
tem that satisfies the specification. More precisely, the system is assumed to 
receive some inputs from an environment and has to react with outputs such 
that the specification is fulfilled. The realizability problem asks for the exis- 
tence of a so-called strategy tree, where the edges are labeled with all possible 
inputs and the task is to find a function f that labels the nodes with the corre- 
sponding outputs. Figure3 shows a strategy tree for a single input bit 7. We 
define strategies following [12]. Let a set AP = I UO be given. A strategy 
f: (24)* — 2° maps sequences of input valuations 2/ to an output valuation 
2°. For an infinite word w = wowiw2--- € (2/)”, the trace corresponding to a 
strategy f is defined as (f(e) U wo)(f (wo) U wi)(f(wow1) U we)... € (27U9)¥. 
For any trace w = wọowiwz... € (217?)" and strategy f: (24)* — 2°, we lift 
the set containment operator € defining that w € f iff f(e) = wg n O and 
f ((wo I) +++ (w;n I)) = wi+1 NO for all i > 0. We say that a strategy f satisfies 
a HyperQPTL formula y over AP = I ÙO iff {w | w € f} satisfies v. 

With the definition of a strategy at hand, we can define the realizability 
problem of HyperQPTL formally. 


Definition 1 (HyperQPTL Realizability). A HyperQPTL formula p over 
atomic propositions AP = IUO is realizable if there is a strategy f: (2')* — 2° 
that satisfies q. 
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Fig.3. A strategy tree for the reactive realizability problem. 


For technical reasons, we assume (without loss of generality) that quantified 
atomic propositions are classified as outputs, not inputs. This complies with the 
intuition that propositional quantifiers should be a means for additional expres- 
siveness; they should not overwrite the inputs received from the environment. 
'The definition of realizability of QPTL and HyperLTL specifications is inherited 
from the definition for HyperQPTL. 

Compared to the standard realizability problem, the distributed realizability 
problem is defined over an architecture, containing a number of processes inter- 
acting with each other. The goal is to find a strategy for each of the processes. 
In the following proofs, we will make use of the distributed realizability problem 
of QPTL, which we therefore also define formally. 

A distributed architecture [17,27] A over atomic propositions AP is a tuple 
(P, Denv, Z, O}, where P is a finite set of processes and Peny € P is a designated 
environment process. The functions Z : P > 2^P and O : P — 24 define the 
inputs and outputs of processes. The output of one process can be the input of 
another process. The output of the processes must be pairwise disjoint, i.e., for 
all p Z p’ € P it holds that O(p) N O(p') = Ø. We assume that the environ- 
ment process forwards inputs to the processes and has no input of its own, i.e., 
L(Denv) = 0. 


Definition 2 (Distributed QPTL Realizability /17]). A QPTL formula 
p over free atomic propositions AP is realizable in an architecture A = 
(P,Penv,Z,O) if for each process p € P, there is a strategy fp: (9H ye. get») 
such that the combination of all fp satisfies q. 


The distributed realizability problem for QPTL is (inherited from LTL) in gen- 
eral undecidable [27]. However, we will use the result that the problem remains 
decidable for architectures without information forks[17|. The notion of infor- 
mation forks captures the flow of data in the system. Intuitively, an architec- 
ture contains an information fork if the processes cannot be ordered linearly 
according to their informedness. Formally, an information fork in an architecture 
A= (P, Penv, Z, O) is defined as a tuple (P', V’, p, p'), where p, p' are two differ- 
ent processes, P' C P, and V’ C AP is disjoint from Z(p) U Z(p'). (P',V',p,p') 
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(a) Information fork: An architecture with (b) No information fork: The same archi- 
two processes; process p to produces out- tecture as on the left, where the inputs of 
put o from input 4 and p’ produces output process p’ are changed to i and i'. 

o' from input i'. 


Fig. 4. Distributed architectures 


is an information fork if P' together with the edges that are labeled with at 
least one variable from V’ forms a subgraph rooted in the environment and 
there exist two nodes q,q' € P’ that have edges to p, p', respectively, such that 
O(q)NZ(p) € Z(p') and O(q') nZ(p) É Z(p). The definition formalizes the intu- 
ition that p and p' receive incomparable input bits, i.e., they have incomparable 
information. 


Example 3. 'Two example architectures are depicted in Fig. 4 [12]. The processes 
in Fig. 4a receive distinct inputs and thus neither process is more informed than 
the other. The architecture therefore contains an information fork with P' — 
[env,p, p], V. = (iü,i'),q = env,q = env. The processes in Fig. 4b can be 
ordered linearly according to the subset relation on the inputs and thus the 
architecture contains no information fork. 


In the following sections, we identify tight syntactic fragments of HyperQPTL 
for which the standard realizability problem is decidable. We give decidability 
proofs and show that formulas outside the decidable fragments are in general 
undecidable. An important aspect for decidability is the number of universal 
trace quantifiers that appear in the formula. We thus present our findings in three 
categories, depending on the number of universal trace quantifiers a formula has. 


4.1 No Universal Trace Quantifier 


We show that the realizability problem of any HyperQPTL formula without a 
Yz quantifier is decidable. The problem is reduced to QPTL realizability. 


Theorem 2. Realizability of the (37,Q7)* fragment of HyperQPTL is decidable. 


Proof. Let a (37Q7)* HyperQPTL formula y over AP = I ÙO = {a?,... af} 
with trace quantifiers To, ...MTn be given. We reduce the problem to the realiz- 
ability problem of QPTL, which is known to be decidable (since QPTL formulas 
can be translated to Büchi automata). The idea is to replace each existential 


trace quantifier Jr; with quantification of propositions a9. ,al.,...,a& , one for 
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each a? € AP, thereby mimicking the quantification of a trace. To make sure 
that only traces from an actual strategy tree are chosen, we add a dependency 
formula which forces the outputs to be dependent on the inputs. The following 
QPTL formula implements the idea. 


porpPTL = [i € n : drj Fa)... Eus] ^ 
A A s, # 1j) R(O;, = Os;) 
i<n j<n 
We use the notation [i € n : Jr; — Ja®,....dak..] to indicate that each m; for 


0 < i < nis replaced with the respective series of existential propositional quan- 
tification. Furthermore, we write I;, 4 In, as syntactic sugar for Vaer as; “ Qr; 
(and similarly for Or; = O;,). We show that p and oQprr are equirealizable. 
For the first direction, assume that y is realizable by a strategy f. Notice that all 
atomic propositions in o Qp7r, are bound by a propositional quantifier. Therefore, 
if the witness sequences for the quantified propositions can be chosen correctly, 
any strategy realizes popr. Propositions al, are chosen according to the witness 
traces of f E v. Witnesses for the remaining atomic propositions are also chosen 
according to their witnesses from f = y. Now, the first conjunct of poprz is 
fulfilled since f | y holds. The second conjunct is fulfilled since any two traces 
Ti, Tj Of a strategy tree fulfill by construction (Ir, Z I;,) R(Oz, = On, ). For the 
other direction, assume that poprz is realizable (by construction independently 
from the strategy). Let tao sero tak be the witness sequences for the respective 


quantified atomic propositions. The following strategy realizes q. 


{ta,,llol] |a E O} if for some i < n, 
f(o) = o = {ta,,[0] | a eT}... {ta,, [lol] | a € I} 
() otherwise 


Strategy f chooses the outputs according to the witnesses for the propositions 
encoding the traces. Note that because of the second conjunct in poprz, the 
output is always unique, even if several encoded traces start with the same 
input sequence. Now, f = v holds because of the first conjunct of ygprr. 


4.2 Single Universal Trace Quantifier 


In this fragment, we allow exactly one universal trace quantifier. It is particularly 
interesting as it contains many promptness properties. For example, the following 
promptness formulation mentioned in the introduction lies within the fragment: 


b Vx. ObA (ab Ue). 
Theorem 3. Realizability of the 37, V; NV. Q7 fragment is decidable. 


* 
q/n 
We show the theorem in two steps. First, we generalize a proof from [12], showing 
that realizability of the 37V. Q^ fragment is decidable. Second, we show that we 
can reduce the realizability problem of any HyperQP TL formula to a formula 
where some propositional quantifiers are replaced with trace quantifiers. 
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pi | pa 
I vo] jo 


Fig. 5. Distributed architecture encoding existential choice of traces. 


Lemma 2. Realizability of the 35V, Q7, fragment is decidable. 


Proof. 'The reasoning generalizes the proof in [12] showing that realizability 
J;V4. HyperLIL formulas is decidable. We reduce the problem to the dis- 
tributed realizability problem of QP'TL without information forks, which is— 
since QPTL is subsumed by the p-calculus—decidable [17]. Let a HyperQPTL 
formula y = Jm.... Inn. Vr. over AP = I Ù O be given, where v is from 
the Q% fragment. We define a distributed architecture A over an extended set 
of atomic propositions AP’ = I U O U I' UO’. Similarly to the proof in Theo- 
rem 2, I" and O’ are composed of a copy of the atomic propositions for each 
existentially quantified variable mj. Formally, I’ = UJ; zz, (is; | i € Ij and 
O' = U,zjz, los; | o € O}. Now we define A as follows. 


Ai ((Denv, P1, P2), Denv: 2, O, ) 
T := (pı > 0, p2 I) 
O := [Pos epo Truy O', pa r7 O) 


The architecture is displayed in Fig. 5. The idea is that process pı sets the 
values of all i4, and or, (for j < n) and thereby determines the choice for the 
existentially quantified traces. Process p, receives no input and therefore needs 
to make a deterministic choice. Process ps then solves the realizability of formula 
Yr. y. The following QPTL formula y’ encodes the idea. 


gie AC (Ies) RO =O) 5 


1<j<n 


where y’ is defined as w, where all a4 are replaced by a (but atomic propositions 
a4, are still part of ~’!). Note that QPTL formulas implicitly quantify over all 
traces universally. Similarly to the proof in Theorem 2, the second conjunct 
ensures that process p; encodes actual paths from the strategy tree of process 
p2 (which is also the strategy tree for formula vy). Thus, y’ is realizable for the 
distributed architecture A iff v is realizable. 


To state the second lemma, we need to define what it means to replace quantifiers 
in a formula. Let o = Qz/4,...,Qz;g4. Y be a HyperQPTL formula, and J be 
a set of indices such that for all j € J, there exists a propositional quantifier 
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Jq; or Vqj in y. Furthermore, assume that no 7; with j € J occurs in o and 
that a € AP. We denote by y[J —4 7] the formula where each propositional 
quantifier 3q; (or Vq;, respectively) with j € J is replaced with the corresponding 
trace quantifier Jrj (or Vrj, respectively); and each qj in ~ is replaced by a;;. 


Lemma 3. Let any Hyper QPTL formula y over AP = IUO and a set of indices 
J be given. If |J —1 a] is realizable, then so is p, where i € I is an arbitrary 
input, assuming w.l.o.g., that I is non-empty. 


Proof. Let p and J be given. Formula y|J —5; 7] replaces the quantification 
over sequences (plebe with trace quantification, where the trace is only used for 
statements about a single input i. We thus exploit the fact that in the realizability 
problem, there is a trace for every input sequence. Therefore, the transformed 
formula is equirealizable. 


Now, we have everything we need to prove Theorem 3. 


Proof (of Theorem 3). Let p be a HyperQPTL formula of the 3 a/nVq Va Qs frag- 
ment. First, observe that in the quantifier prefix of p, the Vj quantifiers andl the 
Vr can be swapped. The resulting formula belongs to the J d AL Q% fragment. 
By Lemma 3, the formula can be transformed to a equirealizable for utile of the 
J V4Q; fragment, for which realizability is decidable by Lemma 2. 


Lemma 3 allows us to decide realizability of a HyperQPTL formula by replacing 
propositional quantifiers with trace quantifiers. Thus, we can reduce HyperQPTL 
realizability to HyperLTL realizability, a fact that we use in Sect. 5 to describe 
a bounded synthesis algorithm for HyperQPTL. 


Corollary 2. The realizability problem of HyperQPTL can be soundly reduced 
to the realizability problem of HyperLTL. 


Lastly, we show that the decidable fragment is tight in the class of formulas 
with a single universal trace quantifier. We do so by showing that a propositional 
Voaq quantifier alternation followed by a single trace quantifier V; leads to an 
undecidable realizability problem. The proof is carried out by a reduction from 
Post’s Correspondence Problem. 


Theorem 4. Realizability is undecidable for HyperQPTL formulas with a single 
Vr quantifier outside the = qvrQ, fragment. 


Proof. Inherited from HyperLTL, realizability of formulas with a V; quantifier 
followed by an 34 quantifier is undecidable [12]. It remains to show that realiz- 
ability of formulas from the V73; V; fragment is in general undecidable. We give 
a reduction from Post’s Correspondence Problem (PCP) [28] to a HyperQPTL 
formula from the Vad Vr fragment. In PCP, we are given two equally long lists a 
and ĝ consisting of finite words from some alphabet X of size n. PCP is the prob- 
lem to find an index sequence (ik)ı<k<K with K > 1 and 1 € ix < n, such that 
Qj, ++ Aig = Bi... Big. Intuitively, PCP is the problem of choosing an infinite 
sequence of domino stones (with finitely many different stones), where each stone 
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dq Xu 


Fig. 6. A sketch of the strategy tree of our PCP reduction: relevant traces are marked 
in green. (Color figure online) 


consists of two words o; and fj. Let a PCP instance with X = {a1, a9,...,an} 
and two lists a and 8 be given. We choose our set of atomic propositions as 
follows: AP := I Ù O with I := {i} and O := (X U {41, de, ..., dn} U 3£)?, where 
we use the dot symbol to encode that a stone starts at this position of the trace. 
We write à to denote either a or à. The single input i spans a binary strategy 
tree. We encode the PCP instance into a HyperQP TL formula that is realizable 
if and only if the PCP instance has a solution: 


Vqi. Yq. Ipi. 3p. Yr. (Or = pi) > (Or = p)) ^ 
T= (qi, q)) = Preduc (Gi, q, Di. P)) ; 


where q and p are sequences of universally and existentially quantified propo- 
sitional variables, such that for each (0,0') € O, there is a q(o,o7) € q and a 
P(o) € p. Together with qi and p; for the input i, they simulate a univer- 
sally and an existentially quantified trace from the model, The notation 7 = q 
denotes that for every qa € q, it holds that ar + qa. As seen before, the premise 
(O7 = (qi, q)) and the conjunct (Or = pi) > (Or = p) ensure that the proposi- 
tions (qi, q) and (pi, p) are chosen to represent actual traces from the model. The 
universal quantification v thus only ensures that (g;,q) and (pi, p), which are 
used for the main reduction, are chosen correctly. The reduction is implemented 
in the formula Yreduc and follows the construction in [10], where it is shown that 
the satisfiability and realizability problem of HyperLTL are undecidable for a 
Va trace quantifier prefix. 


reduc (Qi; q, Pi, p) := Preilqi) -7 Pis++ (qi; Di) 
^ Q start (Ystone& shift (q. p). di) ^ Qsol(di; q) 


— Yrei(Gi) := —^qiU Oq; defines the set of relevant traces trough the binary 
strategy tree (see Fig. 6). 

— Pis++ (qi, pi) = (qi ^ ^pi) U (Oqi ^ ^p; ^AOCIpi) defines that a relevant trace 
is the direct successor trace of another relevant trace. 
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— Psolqi; q) = Oqi — (Visa Uaia) ) ^ (Visa 4,2))) U Oa) ensures that 
the path on which globally i holds is a “solution” trace, i.e., encodes the PCP 
solution sequence. 

— Pstart(P, qi) = mqi U (p AOqi) cuts off an irrelevant prefix until q starts. 

— Ystone&shift(,P) encodes that the trace simulated by q starts with a valid 
encoding of a stone from the PCP instance and that the trace simulated by 
p encodes the same trace but with the first stone removed (see [10]). 


For example, let a with a; = a, a2 = ab, a3 = bba, and f with 04 = baa, B5 = aa 
and 33 = bb be given. A possible solution for this PCP instance is be (3, 2,3, 1), 
since bbaabbbaa = ig = ig. The full sequence at the trace [ ]? represents the 
solution with the outputs 


(b, 5) (b, b) (a, à) (à, a) (b, b) (5, b) (b, b) (a, a) (à, a)(#, 3.) GE. #) -+ 
The next relevant trace, therefore, contains 
(a, à) (b, a) (b, b) (b, b) (a, b) (à, a) (4^, a) GE, E) G5, P) ..- 
Continuing this, the following relevant traces are: 
(b, b) (b, b) (a, D) (à, a) (H, a) (H, E) #) .- 


(à, b) (E, a) (H, a) GRE, E) Rs H) - 
GF AGF) 
The relevant traces verify the solution provided on the [i trace by removing 


one stone after the other. Thus, the formula is realizable iff the PCP instance 
has a solution. 


4.3 Multiple Universal Trace Quantifiers 


When considering multiple universal trace quantifiers V7, the problem becomes 
undecidable. This is because in HyperLTL, one can encode distributed architec- 
tures — for which the problem is undecidable — directly into the formula without 
using any propositional quantification [12]. 


Corollary 3. Realizability of the Vz. fragment is in general undecidable. 


However, we show that the realizability problem for formulas with more than one 
universal trace quantifier is decidable if we restrict ourselves to formulas in the 
so-called linear fragment, i.e., that does not allow an encoding of a distributed 
architecture. We define the linear fragment of HyperQP TL, where the definitions 
are adopted from [12]. 

Let A, C C AP. We define that atomic propositions c € C do solely depend 
on propositions a € A as the HyperQP TL formula 


Dac = Ynyr". (v (a4 €» je) R (A (Cr > 2 


acA cec 
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We define a collapse function, which collapses a HyperQPTL formula with a V7. 
universal quantifier prefix into a formula with a single V, quantifier. Proposi- 
tional quantifiers are preserved by the operation. Let o be Vm --- Vm4. Q7. v. 
We define the collapsed formula of y as collapse((p) = Yr. Q}. v[r1 ^ s][na ^ 
v]... [ra + 7] where v[r; — 7] replaces all occurrences of m; in y with 7. 


Lemma 4. Either y = collapse(w) or p has no equivalent Y}. Q% formula. 


Proof. The collapse function solely works on the trace quantification mechanism 
of the HyperQPTL formula, by reducing them to a single universal quantifi- 
cation. The theorem has been proven for V* HyperLTL formulas in [12]. Inner 
propositional quantification does not interfere with this mechanism, hence, the 
proof can be carried out identically. 


Now we can formally define the linear V7 fragment. Intuitively, we require 
that every input-output dependency can be ordered linearly, i.e., we are 
restricted to linear architectures without information forks (see Example 3). 


Definition 3. Let O = {01,...,0n}. A HyperQPTL formula « is called lin- 
ear if for all o; € O there is a J; C I such that p ^ Drno = collapse(q) ^ 
Noto Dy. and Ji C Ja for all t Sq. 


This results in the following corollary. Since the universal quantifiers can be 
collapsed, the resulting problem is the realizability problem of QPTL in a linear 
architecture, which is decidable [17]. 


Corollary 4. Realizability of the linear VQ? fragment is decidable. 


Remark on Complexities. Our aim was to work out the largest possible fragments 
for which the realizability problem of HyperQP TL remains decidable. The three 
fragments for which we could prove decidability all subsume the logic QPTL, for 
which the realizability problem is known to be non-elementary (already its sat- 
isfiability problem is non-elementary [30]). Hence, realizability of the discussed 
HyperQPTL fragments has a non-elementary lower bound. Finding interest- 
ing fragments for which the problem has a more feasible complexity therefore 
remains an open challenge. 


5 Experiments 


We have implemented a prototype tool that can solve the HyperQPTL realiz- 
ability problem using the bounded synthesis approach [18]. More concretely, 
we extended the HyperLTL synthesis tool BoSy [7,9,12]. Bosy reduces the 
HyperLTL synthesis problem to a SMT constraint system which is then solved 
by z3 [8] (for more see [12]). We implemented the reduction of HyperQPTL 
synthesis to HyperLTL synthesis (Corollary 2) in BoSy, such that the tool can 
also handle HyperQPTL formulas. We evaluated the tool against a range of 
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Table 1. Experimental results for prompt arbiter 


Instance Bound on system | Bound on 3-strategy | Result | Time [sec.] 
arbiter-2-prompt 2 1 unsat |«1 

2 2 sat «1 
arbiter-2-full-prompt | 3 1 unsat |2.4 

3 2 sat 6.0 
arbiter-3-prompt 3 1 unsat |4.2 

3 2 sat 9.5 
arbiter-4-prompt 4 1 unsat | 97 

4 2 ? TO 


benchmarks sets, shown in Tablel. The first column indicates the parameter- 
ized benchmark name. The second and third columns indicate the bounds given 
to the bounded synthesis procedure. The second column is the bound on the size 
of the system. The newest version of BoSy also bounds the size of the strategy 
for the existential player, this bound is given in column three. For a detailed 
explanation of how existential strategies are bounded in BoSy, we refer to [7]. 

We synthesized a range of resource arbiters. Our benchmark set is parametric 
in the number of clients that can request access to the shared resource (written 
arbiter-k-prompt where k is the number of clients in Table1). Unlike normal 
arbiters, we require the arbiter to fulfill promptness for some of the clients, i.e., 
requests must be answered within a bounded number of steps [33]. We state 
the promptness requirement in HyperQP TL by applying the alternating-color 
technique from [24]. Intuitively, the alternating-color technique works as follows: 
We quantify a q-sequence that “changes color” between q and ^q. Each change 
of color is used as a potential bound. Once a request occurs, the grant must be 
given withing two changes of color. Thus, the HyperQPTL formulation amounts 
to the following specifications, here exemplary for 2 clients, where we require 
promptness only for client 1. 


Yr. O(g} ^ g2) (1) 
Vr.O(r2 > O92) (2) 
dq. Vr.0O¢A0O-74 (3) 


^UKr — (q > (aU(^qu g1))) 
^ (^q > (~qU (qU g1)))) 
Va.(^gL Wrl) ^ (292 W r2) (4) 
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Formula 1 states mutual exclusion. Formula 2 states that client 2 must be served 
eventually (but not within a bounded number of steps). Formula 3 states the 
promptness requirement for client 1. It quantifies an alternating q-sequence, 
which serves as a sequence of global bounds that must be respected on all traces 
7. Then, if client 1 poses a request, the grant must be given within two changes 
of the value of q. Formula 4 is only added in benchmarks named arbiter-k-full- 
prompt. It specifies that no spurious grants should be given. 

BoSy successfully synthesizes prompt arbiter of up to 3 states. For a 4-state 
prompt arbiter BoSy did not return in reasonable time. 


6 Conclusion 


We studied the hyperlogic HyperQPTL, which combines the concepts of trace 
relations and w-regularity. We showed that HyperQPTL is very expressive, it 
can express properties like promptness, bounded waiting for a grant, epistemic 
properties, and, in particular, any w-regular property. Those properties are not 
expressible in previously studied hyperlogics like HyperLTL. At the same time, 
we argued that the expressiveness of HyperQP TL is optimal in a sense that 
a more expressive logic for w-regular hyperproperties would have an undecid- 
able model checking problem. We furthermore studied the realizability prob- 
lem of HyperQPTL. We showed that realizability is decidable for HyperQPTL 
fragments that contain properties like promptness. But still, in contrast to the 
satisfiability problem, propositional quantification does make the realizability 
problem of hyperlogics harder. More specifically, the HyperQPTL fragment of 
formulas with a universal-existential propositional quantifier alternation followed 
by a single trace quantifier is undecidable in general, even though the projection 
of the fragment to HyperLTL has a decidable realizability problem. Lastly, we 
implemented the bounded synthesis problem for HyperQPTL in the prototype 
tool BoSy. Using BoSy with HyperQPTL specifications, we have been able to 
synthesize several resource arbiters. The synthesis problem of non-linear-time 
hyperlogics is still open. For example, it is not yet known how to synthesize sys- 
tems from specifications given in branching-time hyperlogics like HyperCTL*. 
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Abstract. The correctness of networks is often described in terms of 
the individual data flow of components instead of their global behavior. 
In software-defined networks, it is far more convenient to specify the cor- 
rect behavior of packets than the global behavior of the entire network. 
Petri nets with transits extend Petri nets and Flow-LTL extends LTL 
such that the data flows of tokens can be tracked. We present the tool 
ADAMMC as the first model checker for Petri nets with transits against 
Flow-LTL. We describe how ADAMMC can automatically encode con- 
current updates of software-defined networks as Petri nets with transits 
and how common network specifications can be expressed in Flow-LTL. 
Underlying ADAMMC is a reduction to a circuit model checking prob- 
lem. We introduce a new reduction method that results in tremendous 
performance improvements compared to a previous prototype. Thereby, 
ADAMMC can handle software-defined networks with up to 82 switches. 


1 Introduction 


In networks, it is difficult to specify correctness in terms of the global behavior 
of the entire system. Instead, the individual flow of components is far more 
convenient to specify correct behavior. For example, loop and drop freedom can 
be easily specified for the flow of each packet. Petri nets and LTL lack this local 
view. Petri nets with transits and Flow-LTL have been introduced to overcome 
this restriction [10]. A transit relation is introduced to follow the flow induced 
by tokens. Flow-LTL is a temporal logic to specify both the local flow of data 
and the global behavior of markings. The global behavior as in Petri nets and 
LTL is still important for maximality and fairness assumptions. In this paper, 


1? ADAMMC is available online at https://uol.de/en/csd/adammce [12]. 
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home 


come To Work 


Fig. 1. Access control at an airport modeled as Petri net with transits. Colored arrows 
display the transit relation and define flow chains to model the passengers. 


we present the tool ADAMMC! as the first model checker for Petri nets with 
transits against Flow-LTL and its application to software-defined networking. 

In Fig. 1, we present an example of a Petri net with transits that models the 
security check at an airport where passengers are checked by a security guard. 
'The number of passengers entering the airport is unknown in advance. Rather 
than introducing the complexity of an infinite number of tokens, we use a fixed 
number of tokens to model possibly infinitely many flow chains. This is done by 
the transit relation which is depicted with colored arrows. 

The left-hand side of Fig. 1 models passengers who want to reach the ter- 
minal. There are three tokens in the places airport, queue, and terminal. Thus, 
transitions start and en are always enabled. Each firing of start creates a new 
flow chain as depicted by the green arrow. This models a new person arriving at 
the airport. Meanwhile, the double-headed blue arrow maintains all flow chains 
that are still in place airport. Passengers have to enter the queue and wait until 
the security check is performed. Therefore, transition en continues every flow 
chain in airport to queue. Checking the passengers is carried out by transition 
check which becomes enabled if the security guard works. T'hus, passengers resid- 
ing in queue have to wait until the guard checks them. Afterwards, they reach 
the terminal. The security guard is modeled on the right-hand side of Fig. 1. By 
firing comeToWork and thus moving the token in place home, her flow chain 
starts and she can repeatedly either idle or work, check passengers, and return. 
Her transit relation is depicted in orange and models exactly one flow chain. 

In Fig. 1, we define the checkpoints cp; and cp» and the booth as a security 
zone and require that passengers never enter the security zone and eventually 
reach the terminal. The flow formula y = A(airport — (Q7(cp, V cpa V booth) ^ 
Oterminal)) specifies this. ADAMMC verifies the example from Fig. 1 against 
the formula QO check — ọ specifying that if passengers are checked regularly 
then they cannot access the security zone and eventually reach the terminal. 

In this paper, we present ADAMMC as a full-fledged tool. First, ADAMMC 
can handle Petri nets with transits and Flow-LTL formulas in general. Sec- 
ond, ADAMMC has an input interface for a concurrent update and a software- 
defined network and encodes both of them as a Petri nets with transits. Common 
assumptions on fairness and requirements for network correctness are also pro- 
vided as Flow-LTL formulas. This allows users of the tool to model check the 
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correctness of concurrent updates and to prevent packet loss, routing loops, and 
network congestion. Third, ADAMMC provides algorithms to check safe Petri 
nets against LTL with both places and transitions as atomic propositions which 
makes it especially easy to specify fairness and maximality assumptions. 

The tool reduces the model checking problem for safe Petri nets with transits 
against Flow-LTL to the model checking problem for safe Petri nets against LTL. 
We develop the new parallel approach to check global and local behavior in 
parallel instead of sequentially. This approach yields a tremendous speed-up for 
a few local requirements and realistic fairness assumptions in comparison to the 
sequential approach of a previous prototype [10]. In general, the parallel approach 
has worst-case complexity inferior to the sequential approach even though the 
complexities of both approaches are the same when using only one flow formula. 

As last step, ADAMMC reduces the model checking problem of safe Petri 
nets against LTL to a circuit model checking problem. This is solved by ABC 
[2,4] with effective verification techniques like IC3 and bounded model checking. 
ADAMMC verifies concurrent updates of software-defined networks with up to 
38 switches (31 more than the prototype) and falsifies concurrent updates of 
software-defined networks with up to 82 switches (44 more than the prototype). 

'The paper is structured as follows: In Sect. 2, we recall Petri nets with transits 
and Flow-LTL. In Sect. 3, we outline the three application areas of ADAMMC: 
checking safe Petri nets with transits against Flow-LTL, checking concurrent 
updates of software-defined networks against common assumptions and specifi- 
cations, and checking safe Petri nets against LTL. In Sect. 4, we algorithmically 
encode concurrent updates of software-defined networks in Petri nets with tran- 
sits. In Sect. 5, we introduce the parallel approach for the underlying circuit 
model checking problem. In Sect. 6, we present our experimental evaluation. 

Further details can be found in the full paper [13]. 


2 Petri Nets with Transits and Flow-LTL 


A safe Petri net with transits N = (P, 7,.7,In,T) [10] contains the set of 
places P, the set of transitions J, the flow relation F C (P x Z)U(.Z x P), 
and the initial marking In C P as in safe Petri nets [27]. In a safe Petri net, 
reachable markings contain at most one token per place. The transit relation T 
is for every transition t € JZ of type Y(t) C (pre (t) U (5)) x post” (t). 
With p Y(t) q, we define that firing transition t transits the flow in place p 
to place q. The symbol > denotes a start and > Y(t) q defines that firing tran- 
sition t starts a new flow for the token in place q. Note that the transit relation 
can split, merge, and end flows. A sequence of flows leads to a flow chain which 
is a sequence of the current place and the fired outgoing transition. Thus, Petri 
nets with transits can describe both the global progress of tokens and the local 
flow of data. 

Flow-LTL [10] extends Linear-time Temporal Logic (LTL) and uses places 
and transitions as atomic propositions. It introduces A as a new operator which 
uses LTL to specify the flow of data for all flow chains. For Fig. 1, the formula 
A(booth — < check) specifies that the guard performs at least one check. We call 
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Fig. 2. Overview of the workflow of ADAMMC: The application areas of the tool are 
given by three different input domains: software-defined network/Flow-LTL (Input 
I), Petri nets with transits/Flow-LTL (Input II), and Petri nets/LTL (Input III). 
ADAMMC performs all unlabeled steps. MCHyper creates the final circuit which ABC 
checks to answer the initial model checking problem. 


formulas starting with A flow formulas. Formulas around flow formulas specify 
the global progress of tokens in the form of markings and fired transitions to 
formalize maximality and fairness assumptions. These formulas are called run 
formulas. Often, Flow-LTL formulas have the form run formula — flow formula. 


3 Application Areas 


ADAMMC consists of modules for three application areas: checking safe Petri 
nets with transits against Flow-LTL, checking concurrent updates of software- 
defined networks against common assumptions and specifications, and checking 
safe Petri nets against LTL. The general architecture and workflow of the model 
checking procedure is given in Fig. 2. ADAMMC is based on the tool ADAM [14]. 
Petri Nets with Transits. Petri nets with transits follow the progress of 
tokens and the flow of data. Flow-LTL allows to specify requirements on both. 
For Petri nets with transits and Flow-LTL (Input II), ADAMMC extends a parser 
for Petri nets provided by APT [30], provides a parser for Flow-LTL, and imple- 
ments two reduction methods to create a safe Petri net and an LTL formula. 
The sequential approach is outlined in [10] and the parallel approach in Sect. 5. 
Software-Defined Networks. Concurrent updates of software-defined net- 
works are the second application area of ADAMMC. The tool automatically 
encodes an initially configured network topology and a concurrent update as a 
Petri net with transits. The concurrent update renews the forwarding table. We 
provide parsers for the network topology, the initial configuration, the concurrent 
update, and Flow-LTL (Input I). In Sect. 4, we present the creation of a Petri 
net with transits from the input and Flow-LTL formulas for common network 
properties like connectivity, loop freedom, drop freedom, and packet coherence. 

Petri Nets. ADAMMC supports the model checking of safe Petri nets 
against LTL with both places and transitions as atomic propositions. It pro- 
vides dedicated algorithms to check interleaving-mazimal runs of the system. 
A run is interleaving-maximal if a transition is fired whenever a transition is 
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enabled. Furthermore, ADAMMC allows a concurrent view on runs and can check 
concurrency-mazimal runs which demand that each subprocess of the system has 
to progress maximally rather than only the entire system. State-of-the-art tools 
like LoLA [32] and ITS-Tools [29] are restricted to interleaving-maximal runs 
and places as atomic propositions. For Petri net model checking (Input III), we 
allow Petri nets in APT and PNML format as input and provide a parser for 
LTL formulas. 

The construction of the circuit in Aiger format [3] is defined in [11]. MCHy- 
per [15] is used to create a circuit from a given circuit and an LTL formula. 
This circuit is given to ABC [2,4] which provides a toolbox of modern hardware 
verification algorithms like IC3 and bounded model checking to decide the initial 
model checking question. As output for all three modules, ADAMMC transforms 
a possible counterexample (CEX) from ABC into a counterexample to the Petri 
net (with transits) and visualizes the net with Graphviz and the dot language [9]. 
When no counterexample exists, ADAMMC verified the input successfully. 


4 Verifying Updates of Software Defined Networks 


We show how ADAMMC can check concurrent updates of realistic examples from 
software-defined networking (SDN) against typical specifications [19]. SDN [6, 25] 
separates the data plane for forwarding packets and the control plane for the 
routing configuration. A central controller initiates updates which can cause 
problems like routing loops or packet loss. ADAMMC provides an input interface 
to automatically encode software-defined networks and concurrent updates of 
their configuration as Petri nets with transits. The tool checks requirements like 
loop and drop freedom to find erroneous updates before they are deployed. 


4.1 Network Topology, Configurations, and Updates 


A network topology T is an undirected graph T = (Sw, Con) with switches as 
vertices and connections between switches as edges. Packets enter the network 
at ingress switches and they leave at egress switches. Forwarding rules are of the 
form x.fwd(y) with x,y € Sw. A concurrent update has the following syntax: 


switch update = upd(x.fwd(y/z)) | upd(x.fwd(y/-)) | upd(x.fwd(-/z)) 
sequential update ::= (update >> update >> ... >> update) 

parallel update ::= (update || update || ... || update) 

update ::= switch update | sequential update | parallel update 


where a switch update can renew the forwarding rule of switch x from switch z 
to switch y, introduce a new forwarding rule from switch x to switch y, or remove 
an existing forwarding rule from switch x to switch z. 
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4.2 Data Plane and Control Plane as Petri Net with Transits 


For a network topology T = (Sw, Con), a set of ingress switches, a set of egress 
switches, an initial forwarding table, and a concurrent update, we show how data 
and control plane are encoded as Petri net with transits. Switches are modeled 
by tokens remaining in corresponding places s whereas the flow of packets is 
modeled by the transit relation T. Specific transitions is model ingress switches 
where new data flows begin. Tokens in places of the form x.fwd(y) configure the 
forwarding. Data flows are extended by firing transitions (x,y) corresponding 
to configured forwarding without moving any tokens. Thus, we model any order 
of newly generated packets and their forwarding. Assuming that each existing 
direction of a connection between two switches is explicitly given in Con, we 
obtain Algorithm 1 which calls Algorithm 2 to obtain the control plane. 


input : T = (Sw, Con), ingress, input : T = (Sw, Con), update, N 


forwarding, update 
output: Petri net with transits 
N = (P,T, F, In, T) for 
update of topology T with 
ingress and forwarding 
create empty N = (2,9, F, In, T); 
for switch s € Sw do 
add place s to 2^; 
add place s to In; 
if s € ingress then 
add transition is to 7; 
add s to pre(is), post (ts); 
add creating data flow 
D Y(ts) s toT; 
add maintaining data flow 
s Y(is) s to Y; 
for connection (x, y) € Con do 
add place x.fwd(y) to P; 
if x.fwd(y) € forwarding then 
add place x.fwd(y) to In; 
add transition (x, y) to 7 ; 
add x, y, x.fwd(y) to 
pre((x,y)), post ((x,y)): 
add connecting data flow 
x 7 (x,y) y to T; 
add maintaining data flow 
| y T((xy)) y to Y; 
MN = call Algorithm 2 with T, 
update, V as input; 


output: N = (2,7, ,In,T) 
for switch update u € SwU do 
// u = upd(x.fwd(y/z)) 
add places u?, uf to P; 
add transition u to 7; 
add u? to pre(u), uf to post(u); 
if z Z - then 
add x.fwd(z) to pre(u); 
if y Z - then 
add x.fwd(y) to post(u); 

for sequential update s € SeU do 
// s= [81, 5585s 8] 51] 
add places sê, sf to P; 
for i € {0,...,|s|} do 
add transition sê to J; 
if ¿ = 0 then 
add s? to pre(s*); 
else 
add sf to pre(s‘); 
if i = |s| then 
add s^ to post (s*); 
else 

add s;,4 to post(s’); 


for parallel update p € PaU do 

add places p°, p^ to P; 

add transitions p°, p^ to J; 

add p? to pre(p°), p! to post(p^); 
for sub-update u; of p do 


ll add uj to post(p?), uf to pre(p®); 
Algorithm 2: Control plane 


add place update? to In; 
Algorithm 1: Data plane 


For the update, let SwU be the set of switch updates in it, SeU the set of 
sequential updates in it, and PaU the set of parallel updates in it. Depending 
on update’s type, it is also added to the respective set. The subnet for the update 
has an empty transit relation but moves tokens from and to places of the form 
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x.fwd(y). Tokens in these places correspond to the forwarding table. The order 
of the switch updates is defined by the nesting of sequential and parallel updates. 
The update is realized by a specific token moving through unique places of the 
form u’, ul, s*, s! , p^, p/ for start and finish of each switch update u € SwU, each 
sequential update s € SeU, and each parallel update p € PaU. A parallel update 
temporarily increases the number of tokens and reduces it upon completion to 
one. Algorithm 2 defines the update behavior between start and finish places 
and connects finish and start places depending on the subexpression structure. 


Fig. 3. Overview of the sequential approach: Each firing of a transition of the original 
net is split into first firing a transition in the subnet for the run formula and subse- 
quently firing a transition in each subnet tracking a flow formula. The constructed LTL 
formula skips the additional steps with until operators. 


IT|: (\Fal + 1)” 


Fig. 4. Overview of the parallel approach: The n subnets are connected such that for 
every transition t € J there are (|Y(t)| + 1)" transitions, i.e., there is one transition 
for every combination of which transit of t (or none) is tracked by which subnet. We 
use until operators in the constructed LTL formula to only skip steps not involving the 
tracking of the guessed chain in the flow formula. 


4.3 Assumptions and Requirements 


We use the run formula OO pre(t) > Ot to assume weak fairness for every 
transition t in our encoding “M. Transitions, which are always enabled after 
some point, are ensured to fire infinitely often. Thus, packets are eventually 
forwarded and the routing table is eventually updated. We use flow formulas to 
test specific requirements for all packets. Connectivity (A(Q V se egress s)) ensures 
that all packets reach an egress switch. Packet coherence (A( séinitiat S) V 
OV scfinat 8))) tests that packets are either routed according to the initial or final 
configuration. Drop freedom (AO (Aecegress ^e — V reco, f)) forbids dropped 
packets whereas loop freedom (A O (Asesw\ egress 8 — (sU L1^8))) forbids rout- 
ing loops. We combine run and flow formula into fairness — requirement. 
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5 Algorithms and Optimizations 


Central to model checking a Petri net with transits VY against a Flow-LTL formula 
gis the reduction to a safe Petri net 7 and an LTL formula y~. The infinite state 
space of the Petri net with transits due to possibly infinitely many flow chains is 
reduced to a finite state model. The key idea is to guess and track a violating flow 
chain for each flow subformula A «;, for i € {1,..., n}, and to only once check the 
equivalent future of flow chains merging into a common place. 

ADAMMC provides two approaches for this reduction: Fig.3 and Fig.4 give 
an overview of the sequential approach and the parallel approach, respectively. 
Both algorithms create one subnet .% > for each flow subformula A v; to track 
the corresponding flow chain and have one subnet “MG to check the run part 
of the formula. The places of “M are copies of the places in M such that the 
current state of the system can be memorized. The subnets ./;^ also consist 
of the original places of “M but only use one token (initially residing on an 
additional place) to track the current state of the considered flow chain. The 
approaches differ in how these nets are connected to obtain N>”. 

Sequential Approach. The places in each subnet ./ > are connected with 
one transition for each transit (Ja = Uez Y(t)). An additional token iterates 
sequentially through the subnets to activate or deactivate the subnet. This allows 
each subnet to track a flow chain corresponding to firing a transition in 45. The 
formula y? takes care of these additional steps by means of the until operator: 
In the run part of the formula, all steps corresponding to moves in a subnet JJ; 
are skipped and, for each subformula A v;, all steps are skipped until the next 
transition of the corresponding subnet is fired which transits the tracked flow 
chain. This technique results in a polynomial increase of the size of the Petri 
net and the formula: M> has C(|./| - n 4- ||) places and O(N |? +n +| 41) 
transitions and the size of o? is in O(N’ - n - |y| + |p|). We refer to [11] for 
formal details. 

Parallel Approach. The n subnets are connected such that the current 
chain of each subnet is tracked simultaneously while firing an original transition 
t € J. Thus, there are (|T(t)| +1)” transitions. Each of these transitions stands 
for exactly one combination of which subnet is tracking which (or no) transit. 
Hence, firing one transition of the original net is directly tracked in one step 
for all subnets. This significantly reduces the complexity of the run part of the 
constructed formula, since no until operator is needed to skip sequential steps. A 
disjunction over all transitions corresponding to an original transition suffices to 
ensure correctness of the construction. Transitions and next operators in the flow 
parts of the formula still have to be replaced by means of the until operator to 
ensure that the next step of the tracked flow chain is checked at the corresponding 
step of the global timeline of 7^. In general, the parallel approach results in an 
exponential blow-up of the net and the formula: M> has (|. |-n-- |. |) places 
and C(|.A |?" 4- .//|) transitions and the size of o? is in O(N |?" -|y| +|y]). For 
the practical examples, however, the parallel approach allows for model checking 
Flow-LTL with few flow subformulas with a tremendous speed-up in comparison 
to the sequential approach. Formal details are in the full version of the paper [13]. 
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Table 1. Overview of optimization parameters of ADAMMC: The three reduction steps 
depicted in the first column can each be executed by different algorithms. The first step 
allows to combine the optimizations of the first and second row. 


1) Petri Net with Transits ~ Petri Net sequential parallel 
inhibitor | act. token | inhibitor | act. token 

2) Petri Net ~ Circuit explicit logarithmic 

3) Circuit ~ Circuit gate optimizations 


Optimizations. Various optimizations parameters can be applied to the model 
checking routine described in Sect. 3 to tweak the performance. Table 1 gives an 
overview of the major parameters. 

We found that the versions of the sequential and the parallel approach with 
inhibitor arcs to track flow chains are generally faster than the versions without. 
Furthermore, the reduction step from a Petri net into a circuit with logarith- 
mically encoded transitions had oftentimes better performance than the same 
step with explicitly encoded transitions. However, several possibilities to reduce 
the number of gates of the created circuit worsened the performance of some 
benchmark families and improved the performance of others. Consequently, all 
parameters are selectable by the user and a script is provided to compare dif- 
ferent settings. An overview of the selectable optimization parameters can be 
found in the documentation of ADAMMC [12]. Our main improvement claims 
can be retraced by the case study in Sect. 6. 


6 Evaluation 


We conduct a case study based on SDN with a corresponding artifact [16]. The 
performance improvements of ADAMMC compared to the prototype [10] are 
summarized in Table 2. For realistic software-defined networks [19], one ingress 
and one egress switch are chosen at random. Two forwarding tables between the 
two switches and an update from the first to the second configuration are chosen 
at random. ADAMMC verifies that the update maintained connectivity between 
ingress and egress switch. The results are depicted in rows starting with T. 
For rows starting with F, we required connectivity of a random switch which is 
not in the forwarding tables. ADAMMC falsified this requirement for the update. 

The prototype implementation based on an explicit encoding can verify 
updates of networks with 7 switches and falsify updates of networks with 38 
switches. We optimize the explicit encoding to a logarithmic encoding and the 
number of switches for which updates can be verified increases to 17. More sig- 
nificantly, the parallel approach in combination with the logarithmic encoding 
leads to tremendous performance gains. The performance gains of an approach 
with inferior worst-case complexity are mainly due to the smaller complexity 
of the LTL formula created by the reduction. The encoding of SDN requires 
fairness assumptions for each transition. These assumptions (encoded in the run 
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Table 2. We compare the explicit and logarithmic encoding of the sequential approach 
with the parallel approach. The results are the average over five runs from an Intel i7- 
2700K CPU with 3.50 GHz, 32 GB RAM, and a timeout (TO) of 30 min. The runtimes 
are given in seconds. 


expl. enc. log. enc. parallel appr. 
T/F Network #Sw| Alg. Time |=| Alg. Time |=| Alg. Time = 
T Arpanet196912 4 IC3 12.08 7| IC3 9.89 V | IC3 2.18 V 
T Napnet 6 IC3 146.49 ¥| IC3 96.06 “| IC3 4.75 V 
E Heanet in IC3 806.81 IC3 84.62 V | IC3 30.30 V 
T .Hibernialreland 7 - TO ? - TO ?| IC3 26.58 V 
T Arpanet19706 9 - TO ?| IC3 362.21 IC3 11.33 V 
T Nordu2005 9 - TO ? - TO ?| IC3 12.67 V 
T Fatman 17 - TO ?| IC3 1543.34 v IC3 162.17 V 
T Myren 37 - TO ? - TO ?| IC3 1309.23 v 
T KentmanJan2011 38 - TO ? - TO ?| IC3 1261.32 v 
F Arpanet196912 4 BMC3 2.18 X |BMC3 1.85 X |BMC3 1.97 X 
F Napnet 6 BMC2 4.17 X BMC2 5.22 X |BMC3 1.48 X 
F Fatman 17 BMC3 168.78 X BMC3 169.82 X BMC3 6.72 X 
F Belnet2009 21 |BMC2 1146.26 X | BMC2 611.81 X |BMC3 24.26 X 


F KentmanJan2011 38 BMC3 167.92 X BMC3 86.44 X BMC2 9.35 X 


F Latnet 69 - TO ? - TO ?|BMC2 209.20 x 
F Ulaknet 82 - TO ? - TO ? |BMC2 1043.74 x 
Sum of runtimes (in hours): 82.99 79.15 30.31 
Nb of TOs (of 230 exper.): 146 138 6 


part of the formula) experience a blow-up with until operators by the sequential 
approach but only need a disjunction in the parallel approach. Hence, the size 
of networks for which ADAMMC can verify updates increases to 38 switches and 
the size for which it can falsify updates increases to 82 switches. For rather small 
networks, the tool needs only a few seconds to verify and falsify updates which 
makes it a great option for operators when updating networks. 


7 Related Work 


We refer to [21] for an introduction to SDN. Solutions for correctness of updates 
of software-defined networks include consistent updates [7,28], dynamic schedul- 
ing [17], and incremental updates [18]. Both explicit and SMT-based model 
checking [1,5, 22,23, 26,31] is used to verify software-defined networks. Closest to 
our approach are models of networks as Kripke structures to use model checking 
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for synthesis of correct network updates [8,24]. The model checking subroutine 
of the synthesizer assumes that each packet sees at most one updated switch. 
Our model checking routine does not make such an assumption. 

There is a significant number of model checking tools (e.g., [29,32]) for Petri 
nets and an annual model checking contest [20]. ADAMMC is restricted to safe 
Petri nets whereas other tools can handle bounded and colored Petri nets. At the 
same time, only ADAMMC accepts LTL formulas with places and transitions as 
atomic propositions. This is essential to express fairness in our SDN encoding. 


8 Conclusion 


We presented the tool ADAMMC with its three application domains: checking 
safe Petri nets with transits against Flow-LTL, checking concurrent updates of 
software-defined networks against common assumptions and specifications, and 
checking safe Petri nets against LTL. New algorithms allow ADAMMC to model 
check software-defined networks of realistic size: it can verify updates of networks 
with up to 38 switches and can falsify updates of networks with up to 82 switches. 
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Abstract. Stutter invariant properties play a special role in state-based 
model checking: they are the properties that can be checked using par- 
tial order reduction (POR), an indispensable optimization. There are 
algorithms to decide whether an LTL formula or Büchi automaton (BA) 
specifies a stutter-invariant property, and to convert such a BA to a form 
that is appropriate for on-the-fly POR-based model checking. 

The interruptible properties play the same role in action-based model 
checking that stutter-invariant properties play in the state-based case. 
These are the properties that are invariant under the insertion or dele- 
tion of “invisible” actions. We present algorithms to decide whether an 
LTL formula or BA specifies an interruptible property, and show how a 
BA can be transformed to an interrupt normal form that can be used in 
an on-the-fly POR algorithm. We have implemented these algorithms in 
a new model checker named MCRERS, and demonstrate their effective- 
ness using the RERS 2019 benchmark suite. 


Keywords: Model checking * Action - Event - LTL - Stutter-invariant 


1 Introduction 


'To apply model checking to a concurrent system, one must formulate properties 
that the system is expected to satisfy. A property may be expressed by specifying 
acceptable sequences of states, or by specifying acceptable sequences of actions— 
the events that cause the state to change. Each approach has advantages and 
disadvantages, and in any particular context one may be more appropriate than 
the other. 

In the state-based context, there is a rich theory involving automata, logic, 
and reduction for model checking. Some of the core ideas in this theory can be 
summarized as follows. First, the behavior of the concurrent system is repre- 
sented by a state-transition system T. One identifies a set AP of atomic proposi- 
tions, and each state of T' is labeled by the set of propositions which hold at that 
state. An execution passes through an infinite sequence of states, which defines 
a trace, i.e., a sequence of subsets of AP. A property is a set of traces, and T 
satisfies the property if every trace of T' is in P. 
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Properties may be specified by formulas in a temporal logic, such as LTL [26]. 
There are algorithms (e.g., [37]) to convert an LTL formula $ to an equivalent 
Büchi automaton (BA) By with alphabet 2^. (Properties may also be specified 
directly using BAs.) The system T satisfies ¢ if and only if the language of the 
synchronous product T & Big is empty. The emptiness of the language can be 
determined on-the-fly, i.e., while the reachable states of the product are being 
constructed. 

A property P is stutter-invariant if it is closed under the insertion and dele- 
tion of repetitions, i.e., sos1--: € P «€ spat --- € P holds for any positive 
integers 49,21,:::. Many algorithms are known for deciding whether an LTL 
formula or a BA specifies a stutter-invariant property [22,24]. There is also an 
argument that only stutter-invariant properties should be used in practice. For 
example, suppose that a trace is formed by sampling the state of a system once 
every millisecond. If we sample the same system twice each millisecond, and 
there are no state changes in the sub-millisecond intervals, the second trace will 
be stutter-equivalent to the first. A meaningful property should be invariant 
under this choice of time resolution. 

Stutter-invariant properties are desirable for another reason: they admit the 
most significant optimization in model checking, partial order reduction (POR, 
[15,23,25]). At each state encountered in the exploration of the product space, 
an on-the-fly POR scheme produces a subset of the enabled transitions. Restrict- 
ing the search to the transitions in those subsets does not affect the language 
emptiness question. Recent work has revealed that the BA must have a certain 
form—^SI normal form" —when POR is used with on-the-fly model checking, 
but any BA with a stutter-invariant language can be easily transformed into SI 
normal form [27]. 

The purpose of this paper is to elaborate an analogous theory for event- 
based models. Event-based models of concurrency are widely used and have 
been extremely influential for over three decades. For example, process algebras, 
such as CSP, are event-based and use labeled transition systems (LTSs) for the 
semantic model. Event-based models are the main formalism used in assume- 
guarantee reasoning (e.g, [10]), and in many other areas. There are mature model 
checking and verification tools for process algebras and LTSs, and which have 
significant industrial applications; see, e.g., [13]. Temporal logics, including LTL, 
CTL, and CTL*, have long been used to specify event-based systems [3,7,12]. 

We call the class of properties in the action context that are analogous to 
the stutter-invariant properties in the state context the interruptible properties 
(Sect. 3). These properties are invariant under “action stuttering” [34], i.e., the 
insertion or deletion of "invisible" actions. We present algorithms for deciding 
whether an LTL formula or a BA specifies an interruptible property (Theorems 
1 and 2); to the best of our knowledge, these are the first published algorithms 
for deciding this property of formulas or automata. 

Interruptible properties play the same role in action-based POR that stutter- 
invariant properties play in state-based POR. In particular, we present an action- 
based on-the-fly POR algorithm that works for interruptible properties (Sect. 4). 
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As with the state-based case, the algorithm requires that the BA be in a cer- 
tain normal form. We introduce a novel interrupt normal form (Definition 11) for 
this purpose, and show how any BA with an interruptible language can be trans- 
formed into that form. The relation to earlier work is discussed in Sect. 5. The 
effectiveness of these reduction techniques is demonstrated by applying them to 
problems in the 2019 RERS benchmark suite (Sect. 6). 


2 Preliminaries 


Let S be a set. 2? denotes the set of all subsets of S. S* denotes the set of 
finite sequences of elements of S; S" the infinite sequences. Let ¢ = sos1 ++- be 
a (finite or infinite) sequence and i > 0. If Ç is finite of length n, assume i < n. 
Then C() denotes the element s;. For any i > 0, Ç? denotes the suffix s;s;41---. 
(C* is empty if Ç is finite and i > n). 

For ¢ € S* and 7 € S* US”, C om denotes the concatenation of ¢ and 7. 

If S C T and ņ is a sequence of elements of T, r|s denotes the sequence 
obtained by deleting from 7 all elements not in S. 


2.1 Linear Temporal Logic 
Let Act be a universal set of actions. We assume Act is infinite. 


Definition 1. Form (the LTL formulas over Act) is the smallest set satisfying: 


— true € Form, 
— if a € Act then a € Form, and 
— if f and g are in Form, so are ^f, f Ag, Xf, and fUg. 


Additional operators are defined as shorthand for other formulas: false = true, 
fyg =-(Gf) ^9), f> g = (Gf) Vg, Ff = trueUf, Gf = —F—-f, and 
[Wg = (fUg) V Gf. 


Definition 2. The alphabet of an LTL formula f, denoted af, is the set of 
actions that occur syntactically within f. 


Definition 3. The action-based semantics of LTL is defined by the relation 
C H, f, where C € Act" and f € Form, which is defined as follows: 


— Ç FA true, 

- € E» aiff (0) =a, 

- C Fa of iff GIA, f, 

- G Ea f^giff c Fa f and Ea g, 
m 

RE 


=, Xfiffc! b. f, and 
I, fUg iff 3i 20.(C Ha g A Vj E0.i-1.0 H, f). 


When using the action-based semantics, the logic is sometimes referred to as 
“Action LTL” or ALTL [11,12]. 
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The state-based semantics is defined by a relation £ s f, where € € (24*)”. 
The definition of [Fs is well-known, and is exactly the same as Definition 3, 
except that € E a iff a € £(0). The action semantics are consistent with 
the state semantics in the following sense. Let f € Form, and Ç = agai -:: € 
Act". Let € = {ag}{ai}--- € (24*)”. Then & E f iff€ E f. The main 
difference between the state- and action-based formalisms is that in the state- 
based formalism, any number of atomic propositions can hold at each step. In 
the action-based formalism, precisely one action occurs in each step. 


Definition 4. Let f,g € Form. Define 


— (action equivalence) f =, g if (C Ea € Fa g) for all ¢ € Act? 
- (state equivalence) f =s g if (£ Ks f € € Hs g) for all € € (2^*)*. 


The following fact about the state-based semantics can be proved by induc- 
tion on the formula structure: 
Lemma 1. Let f € Form and € = sos1--- € (24*)”. Let €E = shs +++, where 
s. =afN si. Then E Es f df E Hs f. 

The following shows that action LTL, like ordinary state-based LTL, is a 
decidable logic: 


Proposition 1. Let f,g € Form, A = af U ag, and 
n=G|(fA\-a)v V («^ A |: 
acA acA bc A\{a} 


Then f =, g f Ah =; g^h. In particular, action equivalence is decidable. 


Proof. Note the meaning of h: at each step in a state-based trace, at most one 
element of A is true. 

Suppose f Ah =s g Ah. Let ¢ = aga1::: € Act”. Let € = {ao}{ai}---. We 
have € Es h. By the consistency of the state and action semantics, we have 


C af E Hs f e é Hs fARSERs gAh & ERS GSO Hag, 


hence f =, g. 

Suppose instead that f =, g. We wish to show € =, fAh & € Hs gAh for 
any € = sos1 -+ € (24*)”. By Lemma 1, it suffices to assume s; C A for all i. 

Let 7 be any element of Act \ A. (Here we are using the fact that Act is infinite, 
while A is finite.) If |s;| > 1 for some i, then £ violates h and therefore violates both 
f ^h and g^h. Sosuppose |s;| < 1 for alli, which means € s h. Let À = aga, +>, 
where a; is the sole member of s; if |s;| = 1, or 7 if |s;| = 0. By Lemmal,é Hs f 
iff {aoha}: -+ Hs f. By the consistency of the action and state semantics, this is 
equivalent to C 4 f. A similar statement holds for g. Hence 


E Fs f^hetPmfecFfe—cF.getbg E Fs g^h. 


The proposition reduces the question of action equivalence to one of ordinary 
(state) equivalence of LTL formulas, which is known to be decidable ([26], see 
also [36, Thm. 24]). 
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Definition 5. For A C Act and f € Form with af C A, let 


£(f, A) - ice A" |S H f} 


2.2 Büchi Automata 


Definition 6. A Büchi Automaton (BA) over Act is a tuple (S, X, —, S°, F) 
where 


S is a finite set of states, 

X, the alphabet, is a finite subset of Act, 
>C Sx X x S is the transition relation, 
S? C S is the set of initial states, and 

F C S is the set of accepting states. 


oF ON 


We will use the following notation and terminology for a BA B. The source of 


Ej . . * . . . a 
a transition (s, a, s") is s, the destination is s', and the label is a. We write s — s' 
a0Q1...dn = a 
as shorthand for (s,a,s’) €—, and s ~*~» s' for 4s, 52,...8, € S. s —> 


sı —5 89...5, = 8’. Fora € A and s € S, we say a is enabled at s ifs s 
for some s’ € S. The set of all actions enabled at s is denoted enabled( D, s). 

For s € S, a path in B starting from s is a (finite or infinite) sequence 7 of 
transitions such that (1) if m is not empty, the source of «(0) is s, and (2) the 
destination of (i) is the source of m(i + 1) for all 4 for which these are defined. 
If m is not empty, define first(7) to be s; if m is finite, define last(7) to be the 
destination of the last transition of 7. We say m spells the word aga---, where 
a; is the label of z(i). 

An infinite path is accepting if it visits a state in F infinitely often. An 
(accepting) trace starting from s is a word spelled by an (accepting) path starting 
from s. An (accepting) trace of B is an (accepting) trace starting from an initial 
state. The language of B, denoted £(B), is the set of all accepting traces of B. 


Proposition 2. There is an algorithm that consumes any finite subset A of Act 
and an f € Form with af C A, and produces a BA B with alphabet A such that 
£(B) = Kf, A). 


Proof. There are well-known algorithms to produce a BA C with alphabet 24 
which accepts exactly the words satisfying f under the state semantics (e.g., 
[37]). Let B be the same as C, except the alphabet is A and there is a transition 


a : r s $ys {a} : 
s — s' in B iff there is a transition s —> s’ in C. We have 


aoa’: € L(B) e [aoHai] aS L(C) 


€ (agam): Fs f 
€» aod: € L(f, A). 
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In practice, tools that convert LTL formulas to BAs produce an automaton 
in which an edge is labeled by a propositional formula $ over af. Such an edge 
represents a set of transitions, one for each P C A for which ¢ holds for the 
valuation that assigns true to each element of P and false to each element of 
A \ P. In this case, the conversion to B entails creating one transition for each 
a € A for which ¢ holds when true is assigned to a and false is assigned to all 
other actions. 


Definition 7. Let B; = (Si, Xi, >i, S9, F;) (i = 1,2) denote two BAs over Act. 
The parallel composition of Bı and Bə is the BA 


Bı | B» = (Sı x $5, 34 U X2, —, S9 x S9, Fi x Fə), 
where — is defined by 


a a a a 
sı 1 $4 a£ 3» 82 —2 $5 ag sı — $1 S2 —2 85 


(51,52) S (51,52) — (51,52) = (s1, 84) (51,52) = (84,84) - 


If we flatten all tuples (e.g., identify (S1 x 52) x S3 with $1 x S2 x $3) then 
|| is an associative operator. 

Note that in the special case where the two automata have the same alphabet 
(27, = Xə), every action is synchronizing, and the parallel composition is the 
usual “synchronous product." In this case, £(B, || B2) = £(B1) n £(B3). 


2.8 Labeled Transition Systems 


Definition 8. A labeled transition system (LTS) over Act is a tuple (Q, A, >, q?) 
for which (Q, A, >, {q°}, Q) is a BA over Act. In other words, it is a BA in which 
all states are accepting and there is only one initial state. 


Definition 9. Let M be an LTS with alphabet A, and f an LTL formula with 
af C A. We write M E fif £(M) C £(f, A). 


The following observation is the basis of the automata-theoretic approach to 
model checking (cf. [36, 84.2]): 


Proposition 3. Let M be an LTS with alphabet A and f an LTL formula with a.f C 
A. Let B bea BA with C(B) = £(5f, A). Then M = f & L(M || B) ^ 0. 


Proof. M and B have the same alphabet, so £(M || B) = £(M) n £(B), hence 


£(M || B) = C(M) n C(^f, A) = £(M) n (A* N EC, A)) = £(M) N LCF, A). 


This set is empty iff £C(M) C £(f, A). 


There are various algorithms to determine language emptiness of a BA; in this 
paper we use the well-known Nested Depth First Search (NDFS) algorithm [2]. 
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3 Interruptible Properties 


3.1 Definition and Examples 


An LTS comes with an alphabet, which is a subset A of Act. By a property over 
A we simply mean a subset P of A". We say a trace C € A" satisfies P if C € P. 
We have already seen two ways to specify properties. An LTL formula f with 
af C A specifies the property £(f,A). A Büchi automaton B with alphabet A 
specifies the property £(B). We next define a special class of properties: 


Definition 10. Given sets V C A C Act, we say a property P over A is V- 
interruptible if 


Cv —mnv-2(CeP encP) for all ¢,7 € A”. 
An LTL formula f is V -interruptible if (f, Act) is V-interruptible. We say f is 


interruptible if f is o f-interruptible. The set of all interruptible LTL formulas 
is denoted Intrpt. 


The set V is known as the visible set. The definition essentially says that the 
insertion or deletion of invisible actions (those in AVV) has no bearing on whether 
a trace satisfies P. Put another way, the question of whether a trace belongs to 
P is determined purely by its visible actions. The following collects some basic 
facts about interruptibility. All follow immediately from the definitions. 


Proposition 4. Let V C A C Act, P C A" and f,g € Form. Then all of the 
following hold: 


1. P is A-interruptible. 

2. If P is V -interruptible, and V C V', then P is V'-interruptible. 
3. If f is interruptible and af C A, then L(f, A) is af-interruptible. 
4. f is interruptible iff the following holds: 


WO, € Act” . (Clay = nlag ^C Fa f) n Fa f. 


5. If af — ag and f =, g then f is interruptible iff g is interruptible. 


Many, if not most, properties that arise in practice are V-interruptible for 
the set V of actions that are mentioned in the property. Assuming a, b, and c 
are distinct actions, we have: 


— For any n > 0, the property “a occurs at most n times" is {a}-interruptible, 
since the insertion or deletion of actions other than a cannot affect whether 
a word satisfies that property. The same is true for the properties “a occurs 
at least n times” and “a occurs exactly n times." These are examples of 
the bounded existence pattern with global scope in a widely used property 
specification pattern system [5]. LTL formulas in this category include G—a 
(a occurs 0 times), Fa (a occurs at least once), and F(a ^ XFa) (a occurs at 
least twice). 
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— The property “after any occurrence of a, b eventually occurs", G(a — Fb), is 
(a, b] -interruptible. This is the response pattern with global scope [5]. 

— The property “after any occurrence of a, c will eventually occur, and no b will 
occur until &', G(a — ((—-5)Uc)), is (a,b, c}-interruptible. This is a variation 
on the absence pattern with after-until scope, and is used to specify mutual 
exclusion [5]. 


On the other hand, the property “a occurs at time 0", (LTL formula a) is 
not {a}-interruptible. Neither is “an event other than a occurs at least once" 
(Fa) nor “only a occurs” (Ga). The property “every occurrence of a is followed 
immediately by b,” formula G(a — Xb), is not (a, b}-interruptible. The property 
“after any occurrence of a, c eventually occurs and until then only b occurs,” 
G(a — X(bUc)), is not (a, b, c}-interruptible. 

'The following provides a useful way to show that two interruptible properties 
are equal: 


Lemma 2. Suppose V C A C Act and P, and P» are V -interruptible properties 
over A. Let F — V* U V* o(AXVV)*?. Then P, = Po iff RAF = Pn. 


Proof. Assume P, NF = P AN F. Let ¢ € Py. If C|v is infinite, then since 
C|v|v =¢|v, and P, is V-interruptible, (|y € Pı. But c|v € V^, so |y € PAF, 
and therefore C|v € P». Since P» is V-interruptible, ¢ € P5. 

If ¢|y is finite, there is a prefix 0 of ¢ such that ¢ = 00m, with n € (V \ A)". 
Let € = 0|y on. We have € € V* o (A\ V)" and Ely = |v, hence € € PAF. 
Therefore £ € P», and since P» is V-interruptible, G € P5. 


The elements of F are known as the V-interrupt-free words over A. 


3.2 Decidability of Interruptibility of LTL Formulas 


We next show that interruptibility is a decidable property of LTL formulas. 
Define intrpt: Form — Form as follows. Given f € Form, let V = af and V = 
Vaev a, and define 8: Form — Form by 


B(true) = true 


bla) = (-V)Ua 
BOF) = -8(f1) 
B(fa ^ fe) = BCRi) A BC fe) 
BXA) = (PUV ^ X8(f1)) V (GV) ^ X8(f1)) 


B(FAU f) = 8Cf) UB(f2). 
for a € Act and fi, f» € Form. Let intrpt(f) = 6( f). 
Theorem 1. Let f be an LTL formula over Act. The following hold: 


1. intrpt(f) is interruptible. 
2. f is interruptible iff intrpt(f) =, f. 


In particular, interruptibility of LTL formulas is decidable. 


Action-Based Model Checking: Logic, Automata, and Reduction 85 


Before proving Theorem 1, we give some intuition regarding the definition of 
intrpt. Function @ can be thought of as consuming a property on V-interrupt-free 
words (i.e., words in V^ U V* o (AV V)*) and extending it to a property on all 
words (A”). It is designed so that 8(g) is V-interruptible and agrees with g on 
V-interrupt-free words. For example, the formula a means “a is the first action" 
(in an interrupt-free word), which extends to the property “a is the first visible 
action" (in an arbitrary word). The formula X f; states “fı holds after removing 
the first action,” so 9(X fı) should declare “8( f1) holds after removing the prefix 
ending in the first visible action." That is almost correct, but there is also the 
possibility that an element of A” has no visible action, which is the reason for 
the second clause in the definition of 8(X fi). 

The remainder of this subsection is devoted to the proof of Theorem 1. First 
note that intrpt( f) and f have the same alphabet, i.e., aintrpt(f) = V. 


Proof of Part 1. Say a subformula g of f is good if B(g) is V-interruptible, 
i.e., 

V6, € Act" .C]v — nlv > (6 Ha B(g) € n Fa 6g). 
We show by induction on formula structure that every subformula of f is good. 
The case g = f will show that intrpt( f) is interruptible. Assume throughout that 


lv — nlv. 

If g = true then (g) = true, so g is clearly good. 

If g = a for some a € Act, then & H, 6(g) = (2V)Ua iff (|y is non-empty 
and C|v (0) = a. Since this depends only on C|v, g is good. 

If g — ^f, and fi is good, then g is good because 


C Fa B(g) € C Fa Bi) € n FF BR) 9 n E Bg). 
If g= fi ^ f», and fı and f2 are good, then g is good because 


€ Fa Blg) & C Fa Ba) ^ € Fa B(fa) 
€ n E. (fh) ^ n E. 8(f2) & » Ka 6(g). 


Suppose g = X fı and fı is good. There are two cases: 


— Case 1: ¢|y is empty. Then no suffix of ¢ or 77 satisfies V. Hence 
0 Ha Bg) & 9 E. XE) e 9€ FE. B(f)  (0etonp. 


Moreover, ¢'|y = n|y (as both are empty), and (f1) is good, so we have 


€ Ha B(R) = n! Fa B(fi). These show G Fa B(g) & n Fa Bl9). 
— Case 2: (|y is nonempty. Let i be the index of the first occurrence of an 
element of V in ¢, and j the similar index for 7. We have 


ey = (lv)? = Qv): = m**lv. 
As f, is good, it follows that (+! E, (fi) = nitt E (fi). Hence 


C Bg S G Ha BD eq Bm | n gy. 
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Suppose g = fiU f» and fı and fz are good. We have B(g) = 8(f1)U B(fa). 
If G E (g) then there exists i > 0 such that C H, (fo) and C H, 8(fi) 
for j < i. Now there is some i/ > 0 such that 7 |y = C'|y and for all j’ < i', 
there is some j < i such that nf |v = C/|y. It follows that n K G(g). Hence g is 
good. 


Proof of Part 2. Suppose first that intrpt( f) =, f. From part 1, intrpt(f) is 
interruptible, so Proposition 4(5) implies f is interruptible. 

Suppose instead that f is interruptible. We wish to show intrpt( f) =, f. By 
Lemma 2, it suffices to show the two formulas agree on V-interrupt-free words. 
We will show by induction that for each subformula g of f, G Ex g €& ¢ Ha 
B(g) for all V-interrupt-free C. The case g = f will complete the proof. 

If g = true, B(g) = true and the condition clearly holds. 

If g =a for some a € Act, C H, 8(g) & C E, (^V)Ua e CE. a,asC 
is V-interrupt-free. 

If g = ^f, and the inductive hypothesis holds for fi, then 


C Fa B(g) & 6€ PF. Bf) 6 E. fi C Fa g. 
If g = fı ^ f» and the inductive hypothesis holds for fı and f2 then 


€ Fa B(g) & € Fa BAC Ea B(f2) & 6 Fa FAG Fa fo & 6 Fa g. 


Suppose g = X fı and the inductive hypothesis holds for fi. Note that any 
suffix of a V-interrupt-free word, e.g., C!, is also V-interrupt-free. If (|y is empty, 


C Ha te) So Ha SAA) SC E BD e G em AS g a g. 


If ¢|y is nonempty, then ¢ F, V, so 


€ Ha B(g) & € E4 CV)U(V ^X8(f)) e € Ha XE) 
oS E [=A BC) x: E afi G A g. 


If g = fiU f2, then applying the inductive hypothesis to fı and fo yields 


C Ea g $ SD B $AVIe«C64 HE, fi 
e 3i 20. E. O(f2 ^Vj «i.C Ha B) 


Decidability follows from part 2 and Proposition 1. This completes the proof 
of Theorem 1. 


Remark 1. The definition of G(X f1) is convenient for the proof but shorter def- 
initions also work. If the formula f; is satisfied by some word ¢ € (AV V)", then 
all such Ç satisfy fi, and the clause (GAV) ^ X(f1) can be replaced by G-V. 
Otherwise, that clause can be removed altogether. One can determine whether a 
formula is satisfied by such a word by replacing every occurrence of every action 
with false. 
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3.8 Generation of Interruptible LTL Formulas 


The following can be used to show that many formulas are interruptible. It 
establishes a kind of parity pattern involving a class of positive formulas (Pos) 
and a class of negative formulas (Neg). It is proved in [28]. 


Proposition 5. There exist Pos, Neg C Form such that (i) for all f, f' € Form, 


(f € Pos ^ f' =, f) > f' € Pos 
(f € Neg ^ f' =, f) > f' € Neg, 


and (ii) for all a € Act, fi, fo € Intrpt, g1, go € Pos, and hi, ha € Neg, 


false, a, shy, gı ^ 92, gı V 92, al fi, aN Xfi € Pos 
true, 5a, 791, hi ^ ho, hi V ho, ~a V fi, ^a V Xf, € Neg 
true, false, fi ^ fa, fi V fa, afi, Fg, Ghi, fiU fa, hiUgi, hiU fi € Intrpt. 


Consider the examples from Sect.3.1. The formula a is positive, so Fa is inter- 
ruptible. Since ~a is negative, G—a is interruptible. Since Fa is interruptible, 
a ^ XFa is positive, hence F(a ^ XFa) is interruptible. 

Formula G(a — Fb) is seen to be interruptible as follows. Since b € Pos, 
Fb € Intrpt, whence ^a V Fb € Neg. Since this last formula is action-equivalent 
to a > Fb, we have a — Fb € Neg. Therefore G(a — Fb) € Intrpt. 

Similarly, (20)Uc € Intrpt, so a > X((^0)Uc) € Neg. This negative formula 
is action-equivalent to a — ((=b)Uc), whence G(a — ((-0)Uc)) € Intrpt. 

Note that Intrpt and the set of stutter-invariant formulas are not comparable. 
For example, f = F(a ^ XFa) is interruptible, but not stutter-invariant. In 
fact f is not action-equivalent to any stutter-invariant formula g, since if there 
were such a g, the sequence aab” would satisfy g, but the stutter-equivalent 
sequence ab^ cannot satisfy g. Conversely, the formulas a and Ga are both 
stutter-invariant, but neither is interruptible. The formula Fa is both stutter- 
invariant and interruptible. Finally, the formula Xa is neither stutter-invariant 
nor interruptible. 


3.4 Decidability of Interruptibility of Büchi Automata 


Definition 11. Let B be a BA with alphabet A, V C A (the visible actions), 
and I = AX V (the invisible actions). We say B is in V-interrupt normal form 
if the following hold for any x € I, a € A, and states $1, s2, and s3: 


1. If sı 55 then B has a state s| such that sı > si s». 
2. If s1 Š s2 & s3 then s,  s3 and if s; is accepting then sj or s3 is accepting. 
3. If s1 Š sə then sı E; 82 for all y € I. 


Proposition 6. Suppose B is in V-interrupt normal form. Then £(B) is V- 
interruptible. 
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Proof. Suppose ¢,7 € A", Ç € £(B), and ¢|y = n|v. We wish to show 7 € £(B). 
Let m be an accepting path for Ç. 

Assume C|y is infinite. By Definition 11(2), we can remove all invisible tran- 
sitions from the accepting path 7, and the result is an accepting path that spells 
C|v. By Definition 11(1), we can insert any arbitrary finite sequence of invisible 
transition between two consecutive visible transitions; we can therefore construct 
an accepting path for 7. 

If C|y is finite, proceed as above to form an accepting path which spells a finite 
prefix of ņ followed by an infinite word of invisible actions. By Definition 11(3), 
that infinite suffix can be transformed to spell any infinite word of invisibles, 
and in that way one obtains an accepting path for 7. 


Given any BA B = (S, A, T, S9, F) and a visible set V C A, define a BA 
oll V) as follows: if V = A, norm (B. V) = B, otherwise norm(B, V) is 
= (S, A, T, $9, F), where 


D = (s € S | there is an accepting path from s with all labels in J} 
S={a|ueS}U {ul | u € F\ D}u {DIV} 
pa is S°} 

= {ûů | u € F} U {DIV} 


T = {(t,a,6) |\aEVAu,ve SA (u,a,v) ET yU 
(à, x, à) Iv € I^uce DU(SNF) yU 
(DIV, z, DIV) PT lU 
(à, x, DIV) [rv € I^Auc DNF kU 
(a,x, ut), (uli a,ul)|2eITAueF\D yU 
(ul, a, ô) l\aEVAUuEF\DAVESA(u,a,v) ET} 


The set $ consists of the original states i, the sharp states u*, and one 
additional state DIV. The mapping from S to S defined by u — ù is injective 
and preserves acceptability and visible transitions, i.e., for any u,v € S and 
a€V,u-5v & à -5 6. It follows that paths in B in which all labels are 
visible correspond one-to-one with paths through original states in B in which 
all labels are visible. Note that every invisible transition in B is a self-loop or 
ends in a sharp state or DIV. Moreover, all transitions in B ending in a sharp 
state or DIV are invisible. 


Proposition 7. For any BA B with alphabet A, and any visible set V C A, 
norm(B, V) is in V-interrupt normal form. 


Proof. To see Definition 11(1), suppose si +, so. If sı Š sı, take Sj = 81. 
Otherwise, sı = û for some u € F \ D, and we can take s = u*. 

For Definition 11(2), suppose s, > s2 = s3. We need to show s, > sz and if 
S2 is accepting then s; or 53 is accepting. If s; = s2, the result is clear, so assume 
81 Æ s2. There are then two cases: s2 = DIV or s2 = uË for some uc F V D. 
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If s2 = DIV, then a € I and s3 = DIV, and we have s, & DIV. As DIV is 
accepting, the desired conclusion holds. 

If sy = u^, then s, = a, which is accepting. There are again two cases: either 
s3 = uË or s3 = 0 for some v € S. If sa = uË then a € I and &  u*, as required. 
If 54 = Ô, then a € V and therefore u v, hence i & ô, as required. 

Definition 11(3) is clear from the definition of T. 


Theorem 2. £(B) is V-interruptible iff C(norm(B,V)) = £(B). In particular 
interruptibility for Büchi Automata is decidable. 


Proof. Let P, = £(B) and P; = £(norm(B, V)). By Proposition 7, norm(B, V) 
is in V-interrupt normal form, so by Proposition 6, P; is V-interruptible. Hence 
one direction is clear: if P; = P5, then Pj is V-interruptible. 

So suppose P, is V-interruptible. We wish to show P, = P». By Lemma 2, it 
suffices to show the two languages contain the same V-interrupt-free words. 

Suppose Ç is a V-interrupt-free word in Pj. If ¢ € V? then an accepting path 
0 in B maps to the accepting path 6 in B, and ¢ € Pz. So assume ( € V*I*. 
Then an accepting path in B has a prefix 0 of visible transitions ending in a 
state u € D. That prefix corresponds to a path Ó in B ending in à. As u € D, 
à Š ü for all x € I. If u is accepting, we get an accepting path for ¢ that fellows 
6 and then loops at à. If u is not accepting then u € D \ F, and & £, DIV for 
all x € I. Since DIV is accepting and DIV & DIV for all z € I, we again get an 
accepting path for ¢ in B. 

Suppose now that ¢ is a V-interrupt-free word in P2. Assume C € V". An 
accepting path for ¢ cannot pass through a sharp state or DIV, because only 
invisible transitions end in those states. So the path passes through only original 
states, and therefore corresponds to an accepting path in B. 

Suppose ¢ € V*I*. An accepting path for ¢ in Ê consists of a prefix 6 of 
visible transitions followed by an infinite accepting path € of invisible transitions. 
As above, 0 corresponds to a path 0 in B ending in a state u. 

We claim that € cannot pass through a sharp state. This is because all invis- 
ible transitions departing from a sharp state are self loops. But sharp states are 
not accepting, while £ is an accepting path of invisible transitions. It follows that 
each transition in £ is a self-loop or terminates in DIV. 

We now claim u € D. For suppose the first transition in £ is a self-loop on dà. 
According to the definition of T', this implies u € D U (S \ F). Hence, if u g D 
then u is not accepting, and all invisible transitions departing from (i are self- 
loops, contradicting the fact that £ is an accepting path. If, on the other hand, 
the first transition in € is à  DIV, for some x € I, then the definition of jd 
implies u € D, establishing the claim. 

Sou € D, i.e., there is an accepting path p in B starting from u and consisting 
of all invisible transitions. The accepting path obtained by concatenating 0 and 
p spells a word which, projected onto V, equals C|y. Since Pj is V-interruptible, 
¢ € P4. This completes the proof that Pj = P5. 

The theorem reduces the problem of determining V-interruptibility to a prob- 
lem of determining equivalence of two Büchi Automata, which can be done using 
language intersection, complement, and emptiness algorithms for BAs [37]. 
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4 On-the-Fly Partial Order Reduction 


4.1 General Theory and Soundness Theorem 


Let M = (Q,A,T,q°) be an LTS, V C A, and B = (S,A,6,9°,F) a 
V-interruptible BA. The goal of on-the-fly POR is to explore a sub-automaton R' 
of R= M || B with the property that C(R) 2 0 = L(R’) = 0. 

A function amp: Qx S — 24 is an ample selector if amp(q, s) C enabled(M, q) 
for all q € Q,s € S. Each amp(q, s) is an ample set. An ample selector determines 
a BA R’ = reduced(R, amp) which has the same states, accepting states, and 
initial state as R, but only a subset of the transitions: 


R'-(Qxx585,4,0,10]] x S, Q x F) 
ó' = (((q, 5), a. (q',5')) | a € amp(q, s) ^ (q,a, q) € T ^ (s.a,5) € Ô}. 
We now define some constraints on an ample selector that will be used to 


guarantee the reduced product space has nonempty language if the full space 
does. First we need the usual notion of independence: 


Definition 12. Let M be an LTS with alphabet A, and a,b € A. We say a and 
b are independent if both of the following hold for all states q and q' of M: 


1. (q q! ^ b € enabled( M, q)) = b € enabled( M, q’) 
2.q75q & qq. 


We say a and b are dependent if they are not independent. 


Note that, in contrast with [1], we do not assume actions are deterministic. We 
can now define the four constraints: 


CO For all q € Q, s € S: enabled( M, q) 4 0 > amp(q, s) 4 0. 

C1 For all q € Q, s € S: on any trace in M starting from q, no action outside 
of amp(q, s) but dependent on an action in amp(q, s) can occur without an 
action in amp(q, s) occurring first. 

C2 For all q € Q, s € S: if amp(q, s) # enabled( M, q), then amp(q, s) NV = (). 

C3 For all a € A: on any cycle in R’ for which a is enabled in R at each state, 
there is some state (q, s) on the cycle for which a € amp(q, s). 


Theorem 3. Let M be an LTS with alphabet A, V C A, B a BA with alphabet A 
in V -interrupt normal form, R= M || B, and amp an ample selector satisfying 
C0-C3. Then £(reduced(R, amp)) =9 = L(R) — 0. 


The requirement that B be in interrupt normal form is necessary. A coun- 
terexample when that condition is not met is given in Fig. 1. Note a and b are 
independent, and a is invisible. The ample set for product states 0 and 1 is {a}; 
the ample set for product state 2 is {a,b}. Hence C3 holds because a state on 
the sole cycle is fully enabled. After normalizing B (and removing unreachable 
states), this problem goes away: in any reduced space, the ample sets must retain 
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the a-transitions, and state 0* must be fully enabled since it has an a-self-loop, 
so the accepting cycle involving the two states will remain. 

'The remainder of this section is devoted to the proof of Theorem 3. The 
proof is similar to that of the analogous theorem in the state-based case [27], 
but some changes are necessary and we include the proof for completeness. 

Let 0 be an accepting path in R. An infinite sequence of accepting paths 
70,71,... Will be constructed, where mo = 0. For each i > 0, m; will be decom- 
posed as 7; o 0;, where n; is a finite path of length i in R’, 0; is an infinite path, 
and 7; is a prefix of 7:41. For i = 0, no is empty and 9 = 0. 

Assume i > 0 and we have defined 7; and 0; for j < i. Write 


6; = (go,s0) = (qm, 81) > ++ (1) 


Then 7:41 and 6;41 are defined as follows. Let E = amp(qo, so). There are two 
Cases: 


Case 1: a, € E. Let rj 41 be the path obtained by appending the first transition 
of 6; to ni, and 0;,1 the path obtained by removing the first transition from 6;. 


Case 2: a, ¢ E. Then there are two sub-cases: 


Case 2a: Some operation in E occurs in 0;. Let n be the index of the first such 
occurrence. By C1, a; and a, are independent for 1 € j < n. By repeated 
application of the independence property, there is a path in M of the form 


Gn / Q1 


qo 3 d, > 


Qn—1 an+ An+2 


a2 Qn-2 7 1 
D Qn—-1 ©? n — Qni 7 


By C2, an is invisible. By Definition 11, B has an accepting path of the form 


Gn / Q1 a2 Qn—2 An-1 Qn4l Qn4-2 
$0 `> 89 > $1 7° > Sn—2 > Sn-1 — Sn4+1 7 


Composing these two paths yields a path in R. Removing the first transition 
(labeled an) yields 0;,1. Appending that transition to rj; yields 7:41. 


Fig.1. Counterexample to Theorem 3 if B is not in interrupt normal form: (a) the 
LTS M, (b) the BA B representing GF, (c) the product space—dashed edges are in 
the full, but not reduced, space, and (d) the result of normalizing B and removing 
unreachable states, which also depicts the resulting full product space. 
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Case 2b: No operation in E occurs in 6;. By CO, E is nonempty. Let b € E. 
By C2, every action in 0; is independent of b. As in the case above, we obtain a 
path in R 
b roy A py a2 , | a3 
(qo; 50) > (q1; S0) > (45,51) > (d3; 52) >>. 
and define 0;,, and ni+ı as above. 

Let n be the limit of the m, i.e., (i) = ni+ı (i). It is clear that 7 is an infinite 
path in R’, but we must show it passes through an accepting state infinitely 
often. To see this, define integers d; for i > 0 as follows. Let €; = sgs,--- be the 
sequence of BA states traced by 6;. Let d; be the minimum j > 0 such that s; 
is accepting. Note that d; = 0 iff last(j;) is accepting. 

Suppose i > 0 and d; > 0. If Case 1 holds, then d;,1 = d; — 1, since &41 = E}. 
It is not hard to see that if Case 2 holds, d;,1 < di. Note that in Case 2a, if 
d; = n, the accepting state sn is removed, but Definition 11(2) guarantees that at 
least one of $4.1 and 54,4 is accepting. In the worst case (s„—1 is not accepting), 
we still have dj, =n. 

We claim there are an infinite number of i > 0 such that Case 1 holds. 
Otherwise, there is some i > 0 such that Case 2 holds for all 7 > i. Let a be the 
first action in 0;. Then for all 7 > i, a is the first action of 0; and a is not in 
the ample set of last(7;). Since the number of states of R is finite, there is some 
k > i such that last(7,) = last(7;). Hence there is a cycle in R’ for which a is 
always enabled but never in the ample set, contradicting C3. 

If 7 does not pass through an accepting state infinitely often, there is some 
i > 0 such that for all j > i, first(0;) is not accepting. But then (d;),>; is 
a nondecreasing sequence of positive integers which strictly decreases infinitely 
often, a contradiction. 


4.2 Ample Sets for a Parallel Composition of LTSs 


We now describe the specific method used by MCRERS to select ample sets. 
Since this method is similar to existing approaches, such as [32, Algorithm 4.3], 
we just outline the main ideas. 

Let n > 1, P = {1,...,n}, and let Mi,..., M, be LTSs over Act. Write 
Mi = (Qi, Aj, >i; q9) and 


M = M; || -+ || Mn =(Q,4,—,¢°). 


For a € A, let procs(a) = {i € P | a € A;}. It can be shown that if a and b are 
dependent actions, then procs(a) N procs(b) # 0. 
Let q = (q1, -.-, qn) E Q and E; = enabled(M;, qi) for i € P. Let 


Ry = {(i,j) € P x P| E; N A; # 0). 


Suppose C C P is closed under Rg, i.e., for all i € C and j € P, (i,j) € Ry > 
j € C. This implies that if a € E; for some i € C then procs(a) C C. Define 


enabled(C, q) = enabled( M, q) ^ U Aj. 
iec 
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Let E = enabled(C,q). Note E C U;cc E;. Hence for any a € E, procs(a) C C. 


Lemma 3. On any trace in M starting from q, no action outside of E but 
dependent on an action in E can occur without an action in E occurring first. 


Proof. Let Ç bea trace in M starting from q, such that no element of E occurs in 
C. We claim no action involving C (i.e., an action a for which procs(a) N C z Ø) 
can occur in C. Otherwise, let x be the first such action. Then x € Kj, for 
some i € C, so procs(z) C C. As xz ¢ E, x ¢ enabled( M, q). So some earlier 
action y in Ç caused x to become enabled, and therefore procs(x) N procs(y) 4 0), 
hence procs(y) N C Z 0, contradicting the assumption that x was the first action 
involving C in ¢. 

Now any action b dependent on an action a € E must satisfy procs(a) 
procs(b) is nonempty. Since procs(a) C C, procs(b) N C is nonempty. Hence no 
action dependent on an action in E can occur in ¢. 


m 


We now describe how to find an ample set in the context of NDFS. Let (q, s) 
be a new product state that has just been pushed onto the outer DFS stack. The 
relation Ry defined above gives P the structure of a directed graph. Suppose that 
graph has a strongly connected component Co such that all of the following hold 
for E — enabled(Co, q): 


EF, 

Env =Í, 

enabled(C", q) = () for all SCCs C” reachable from Co other than Co, and 

E does not contain a “back edge”, i.e., if (q,s) S o for some a € E and 
o E Qx S, then c is not on the outer DFS stack. 


Pwnr 


Then set amp(q, s) = E. If no such SCC exists, set amp(q, s) = enabled( M, q). It 
follows that C0-C4 hold. Note that the union C of all SCCs reachable from Co 
is closed under R4, and enabled(C, q) = E, so Lemma 3 guarantees C1. For C3, 
we actually have the stronger condition that in any cycle in the reduced space, at 
least one state is fully enabled. In our implementation, the SCCs are computed 
using Tarjan’s algorithm. Among all SCCs Co satisfying the conditions above, 
we choose one for which |enabled(Co, q)| is minimal. 

One known issue when combining NDFS with on-the-fly POR is that the 
inner DFS must explore the same subspace as the outer DFS, i.e., amp must be 
a deterministic function of its input (q,s) [18]. To accomplish this, MCRERS 
stores one additional integer j in the state: 7 is the root node of the SCC Co, or 
—] if the state is fully enabled. The outer search saves j in the state, and the 
inner search uses j to reconstruct the SCC Co and the ample set E. 


5 Related Work 


'There has been significant earlier research on the use of partial order reduction 
to model check LTSs (or the closely related concept of process algebras); see, e.g., 
[14,16,30—33,35]. To understand how this previous work relates to this paper, 
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we must explain a subtle, but important, distinction concerning how a property 
is specified. In much of this literature, a property of an LTS with alphabet A 
is essentially a pair m = (V, T), where V C A is a set of visible actions and T 
is a set of (finite and infinite) words over V. A property in this sense specifies 
acceptable behaviors after invisible actions have been removed. (See, e.g., Def. 
2.4 and preceding comments in [32].) We can translate 7 to a property P in our 
sense by taking its inverse image under the projection map: 


P = {$ € A” | Slv € T}. 


Note that P is V-interruptible by definition. Hence the need to distinguish inter- 
ruptible properties does not arise in this context. 

Much of the earlier work on POR for LTSs deals with the “offline” case, i.e., 
the construction of a subspace of M that preserves certain classes of properties. 
In contrast, Theorem 3 deals with an on-the-fly algorithm, i.e., the construction 
of a subspace of M || B. The on-the-fly approach is an essential optimization in 
model checking, but recent work in the state-based formalism has shown that 
offline POR schemes do not always generalize easily to on-the-fly algorithms [27]. 

One work that does describe an on-the-fly model checking algorithm for LT Ss 
is [32] (see also [17], which deals with the same ideas in a state formalism). The 
property is specified by a tester process B. Consistent with the notion of property 
described above, the alphabet of B does not include the invisible actions. Hence, 
in the parallel composition M || B, the tester does not move when M executes 
an invisible action. In order to specify both finite and infinite words of visible 
actions, the tester has two kinds of accepting states: “livelock monitor states” 
and “infinite trace monitor states.” (Two additional classes of states for detecting 
other kinds of violations are not relevant to the discussion here.) A version of the 
stubborn set theory is used to define the reduced space, and a special condition is 
used to solve the “ignoring problem” (instead of our C3). It would be interesting 
to compare this algorithm with the one described here. 

There are many algorithms for reducing or even minimizing the size of an 
LTS while preserving various properties, e.g., bisimulation equivalence [8] or 
divergence preserving bisimilarity [6]. These algorithms could be applied to the 
individual components of a parallel composition (taking all visible and commu- 
nication actions to be “visible”), as a preprocessing step before beginning the 
model checking search. An exploration of these algorithms, and how they impact 
POR, is beyond the scope of this paper, but we hope to explore that avenue in 
future work. 

The RERS Challenge [9,19-21] is an annual event involving a number of 
different categories of large model checking problems. The “parallel LTL cate- 
gory,” offered from 2016 on, is directly relevant to this paper. Each problem in 
that category consists of a Graphviz “dot” file specifying an LTS as a parallel 
composition, and a text file containing 20 LTL formulas. The goal is to identify 
the formulas satisfied by the LTS. The solutions are initially known only to the 
organizers, and are published after the event. The RERS semantics for LTSs, 
LTL, and satisfiability are exactly the same as in this paper. 
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The methods for generating the LTS and the properties are complicated, and 
have varied over the years, but are designed to satisfy certain hardness guaran- 
tees. The approach described in [29] is “... based on the weak refinement ...of 
convergent systems which preserves an interesting class of temporal properties." 
It can be seen that the properties preserved by weak refinement are exactly the 
interruptible properties. While [29] does not describe a method for determin- 
ing whether a property is interruptible, the authors have informed us that they 
developed a sufficient condition for an LTL formula to be interruptible, and used 
this in combination with a random method to generate the formulas for 2016 
and 2019. Our analysis (Sect. 6) confirms that all formulas from 2016 and 2019 
are interruptible, while 2017 and 2018 contain some non-interruptible formulas. 

There is a well-known way to translate a system and property expressed 
in an action-based formalism to a state-based formalism. The idea is to add a 
shared variable last which records the last action executed. An LTL formula over 
actions can be transformed to one over states by replacing each action a with the 
predicate last = a. This is the approach taken in the Promela representations of 
the parallel problems provided with the RERS challenges. 

This translation is semantics-preserving but performance-destroying. Every 
transition writes to the shared variable last, so any state-based POR scheme 
will assume that no two transitions commute. Furthermore, since the property 
references last, all transitions are visible. This effectively disables POR, even 
when the property is stutter-invariant, as can be seen in the poor performance 
of SPIN on the RERS Promela models (Sect.6). It is possible that there are 
more effective SPIN translations; [34, $2.2], for example, suggests not updating 
last on invisible actions, and adding a global boolean variable that is flipped on 
every visible action (in addition to updating last). We note that this would also 
require modifying the LTL formula, or specifying the property in some other 
way. In any case, it suggests another interesting avenue for future work. 


6 Experimental Results and Conclusions 


We implemented a model checker named MCRERS based on the algorithms 
described in this paper. MCRERS is a library and set of command line tools. 
It is written in sequential C and uses the Spot library [4] for several tasks: (1) 
determining equivalence of LTL formulas, (2) determining language equivalence 
of BAs, and (3) converting an LTL formula to a BA. The source code for McR- 
ERS as well as all artifacts related to the experiments discussed in this section, 
are available at https:/ /vsl.cis.udel.edu/cav2020. The experiments were run on 
an 8-core 3.7GHz Intel Xeon W-2145 Linux machine with 256 GB RAM, though 
MCcRERS is a sequential program and most experiments required much less 
memory. 

As described in Sect.5, each edition of RERS includes a number of prob- 
lems, each of which comes with 20 LTL formulas. The numbers of problems for 
years 2016-2019 are, in order, 20, 15, 3, and 9, for a total of 47 problems, or 
4T x 20 = 940 distinct model checking tasks. (Some formulas become identical 
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after renaming propositions.) We used the MCRERS property analyzer to ana- 
lyze these formulas to determine which are interruptible; the algorithm used is 
based on Theorem 1. The results show that all formulas from 2016 and 2019 
are interruptible, which agrees with the expectations of the RERS organizers. In 
2017, 22 of the 300 formulas are not interruptible; these include 


— GF-ai111, SIGTRAP, 
— G[a71_SIGVTALRM — X-a71. SIGVTALRM], and 
— G[(a59_SIGUSR1 ^ X|(—a112. SIGHUP)Ua59. SIGUSR1]) — FGa104_SIGPIPE]. 


In 2018, 3 of the 60 formulas are not interruptible. In summary, only 25 of the 
940 tasks involve non-interruptible formulas. T'he total runtime for the analysis 
of all 940 formulas was 6 s. 

We next used the MCRERS automaton analyzer to create BAs from each of 
the interruptible formulas, and then to determine which of these Spot-generated 
BAs was not in interrupt normal form. This uses a straightforward algorithm 
that iterates over all states and checks the conditions of Definition 11. For each 
BA not in normal form, the analyzer transforms it to normal form using function 
norm of Sect. 3.4. Interestingly, all of the Spot-generated BAs in 2016 and 2019 
were already in normal form. Four of the BAs from interruptible formulas in 2017 
were not in normal form; all of these formulas had the form F[a V ((25)Wzc)]. 
In 2018, 6 interruptible formulas have non-normal BAs; these formulas have 
several different non-isomorphic forms, some of which are quite complex. The 
details can be seen on the online archive. The total runtime for this analysis 
(including writing all BAs to a file) was 11s. 

The McRERS model checker parses RERS “dot” and property files to con- 
struct an internal representation of a parallel composition M = M; || --: || M 
of LTSs and a list of LTL formulas. Each formula f is converted to a BA B; if f 
is interruptible and B is not already in normal form, B is transformed to normal 
form. The NDFS algorithm is used to determine language emptiness, and if f is 
interruptible, the POR scheme described in Sect. 4 is also used. States are saved 
in a hash table. 

One other simple optimization is used regardless of whether f is interruptible. 
Let aM denote the set of actions labeling at least one transition in M, and 
define aB similarly. If aM z aB, then all transitions labeled by an action 
in (aM \ aB) U (aB \ aM) are removed from the M; and B; all unreachable 
states and transitions in the M; and B are also removed. This is repeated until 
aM — aB. 

We applied the model checker to all problems in the 2019 benchmarks. Inter- 
estingly, all 180 tasks completed, with the correct results, using at most 8 GB 
RAM; the times are given in Fig. 2. 

We also ran these problems with POR turned off, to measure the impact 
of that optimization. As is often the case with POR schemes, the difference is 
dramatic. The non-POR tests ran out of memory on our 256 GB machine after 
problem 106. We show the resources consumed for a representative task in Fig. 3; 
this property holds, so a complete search is required. In terms of number of states 
or time, the performance differs by about 5 orders of magnitude. 
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Problem | 101 102 103 104 105 106 107 108 109 
Components 8 10 12 15 20 25 50 60 70 
Time (s) 1 1 1 1 1 1 14 54 432 


Fig. 2. Time to solve RERS 2019 parallel LTL problems using MCRERS. Each problem 
comprises 20 LTL formulas. Memory limited to 8 GB. Rows: problem number, number 
of components in the LTS, and total MCRERS wall time rounded up to nearest second. 


POR? States saved ‘Transitions Memory (MB) Time (s) 
YES  L55x10! 155x107 1.26 x 107 « 0.1 
NO 1.89 x 10? 1.35 x10!9 2.61 x 10? 7865.0 


Fig. 3. Performance impact of POR on solving RERS 2019 problem 106, formula 1, 
(a6 — Fa7)W (a7 V a88). 


Tool States Transitions Memory(MB) Time(s) 
SPIN 8.16x 10" 2.01x 10° 1.09 x 107 292.0 
McRERS 1.80x 10? 1.93x10? 5.06 x 10! « 0.1 


Fig. 4. Performance of SPIN v6.5.1 and MCRERS on RERS 2019 problem 101, property 
1. Both tools used POR. SPIN used -DCOLLAPSE for state compression and -m100000000 
for search depth bound. 


As explained in Sect. 5, the RERS SPIN models can not be expected to per- 
form well. We ran the latest version of SPIN on these using -DCOLLAPSE compres- 
sion. We show the result for just the first task in Fig.4. There is at least a 4 
order of magnitude performance difference (measured in states or time) between 
the tools. An examination of SPIN's output in verbose mode reveals the problem 
to be as described in Sect.5: the full set of enabled transitions is explored at 
each transition due to the update of the shared variable. 

'The 2016 RERS problems are more challenging for MCRERS. The problems 
are numbered from 101 to 120. To scale beyond problem 111, with a memory 
bound of 256 GB, additional reduction techniques, such as the component min- 
imization methods discussed in Sect.5, must be used. We plan to carry out a 
thorough study of those methods and how they interact with POR. 
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Abstract. SM T-based model checkers, especially IC3-style ones, are 
currently the most effective techniques for verification of infinite state 
systems. They infer global inductive invariants via local reasoning about 
a single step of the transition relation of a system, while employing SMT- 
based procedures, such as interpolation, to mitigate the limitations of 
local reasoning and allow for better generalization. Unfortunately, these 
mitigations intertwine model checking with heuristics of the underlying 
SMT-solver, negatively affecting stability of model checking. 

In this paper, we propose to tackle the limitations of locality in a 
systematic manner. We introduce explicit global guidance into the local 
reasoning performed by IC3-style algorithms. To this end, we extend the 
SMT-IC3 paradigm with three novel rules, designed to mitigate funda- 
mental sources of failure that stem from locality. We instantiate these 
rules for the theory of Linear Integer Arithmetic and implement them on 
top of SPACER solver in Z3. Our empirical results show that GSPACER, 
SPACER extended with global guidance, is significantly more effective 
than both SPACER and sole global reasoning, and, furthermore, is insen- 
sitive to interpolation. 


1 Introduction 


SM'T-based Model Checking algorithms that combine SM'T-based search for 
bounded counterexamples with interpolation-based search for inductive invari- 
ants are currently the most effective techniques for verification of infinite state 
systems. They are widely applicable, including for verification of synchronous 
systems, protocols, parameterized systems, and software. 

The Achilles heel of these approaches is the mismatch between the local 
reasoning used to establish absence of bounded counterexamples and a global 
reason for absence of unbounded counterexamples (i.e., existence of an induc- 
tive invariant). This is particularly apparent in IC3-style algorithms [7], such as 
SPACER [18]. IC3-style algorithms establish bounded safety by repeatedly com- 
puting predecessors of error (or bad) states, blocking them by local reasoning 
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about a single step of the transition relation of the system, and, later, using 
the resulting lemmas to construct a candidate inductive invariant for the global 
safety proof. The whole process is driven by the choice of local lemmas. Good 
lemmas lead to quick convergence, bad lemmas make even simple-looking prob- 
lems difficult to solve. 

The effect of local reasoning is somewhat mitigated by the use of interpo- 
lation in lemma construction. In addition to the usual inductive generalization 
by dropping literals from a blocked bad state, interpolation is used to further 
generalize the blocked state using theory-aware reasoning. For example, when 
blocking a bad state x = 1A y = 1, inductive generalization would infer a sub- 
clause of x Z 1 V y #1 as a lemma, while interpolation might infer « Æ y — 
a predicate that might be required for the inductive invariant. SPACER, that is 
based on this idea, is extremely effective, as demonstrated by its performance 
in recent CHC-COMP competitions [10]. The downside, however, is that the 
approach leads to a highly unstable procedure that is extremely sensitive to syn- 
tactic changes in the system description, changes in interpolation algorithms, 
and any algorithmic changes in the underlying SMT-solver. 

An alternative approach, often called invariant inference, is to focus on the 
global safety proof, i.e., an inductive invariant. This has long been advocated by 
such approaches as Houdini [15], and, more recently, by a variety of machine- 
learning inspired techniques, e.g., FreqHorn [14], LinearArbitrary [28], and ICE- 
DT [16]. The key idea is to iteratively generate positive (i.e., reachable states) 
and negative (i.e., states that reach an error) examples and to compute a can- 
didate invariant that separates these two sets. The reasoning is more focused 
towards the invariant, and, the search is restricted by either predicates, tem- 
plates, grammars, or some combination. Invariant inference approaches are par- 
ticularly good at finding simple inductive invariants. However, they do not gen- 
eralize well to a wide variety of problems. In practice, they are often used to 
complement other SMT-based techniques. 

In this paper, we present a novel approach that extends, what we call, local 
reasoning of IC3-style algorithms with global guidance inspired by the invariant 
inference algorithms described above. Our main insight is that the set of lem- 
mas maintained by IC3-style algorithms hint towards a potential global proof. 
However, these hints are lost in existing approaches. We observe that letting the 
current set of lemmas, that represent candidate global invariants, guide local 
reasoning by introducing new lemmas and states to be blocked is often sufficient 
to direct IC3 towards a better global proof. 

We present and implement our results in the context of SPACER—a solver 
for Constrained Horn Clauses (CHC)—implemented in the Z3 SMT-solver [13]. 
SPACER is used by multiple software model checking tools, performed remarkably 
well in CHC-COMP competitions [10], and is open-sourced. However, our results 
are fundamental and apply to any other IC3-style algorithm. While our imple- 
mentation works with arbitrary CHC instances, we simplify the presentation by 
focusing on infinite state model checking of transition systems. 

We illustrate the pitfalls of local reasoning using three examples shown in 
Fig. 1. All three examples are small, simple, and have simple inductive invariants. 
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All three are challenging for SPACER. Where these examples are based on SPACER- 
specific design choices, each exhibits a fundamental deficiency that stems from 
local reasoning. We believe they can be adapted for any other IC3-style verification 
algorithm. The examples assume basic familiarity with the IC3 paradigm. Readers 
who are not familiar with it may find it useful to read the examples after reading 
Sect. 2. 


a, c := 0, €; a, b:-0, €; a, b, c :=@, @, 0; 
//b,d:-a,c; while(nd()) while(nd(Q) 
3 b,d:= ð, ð; // inv: a > 0^b2 2; // inv: b = ¢; 
4 while(ndQ)) { { 
5 Jf Inve ac bd; a:-atb; att; btt; c++; 
{ bt; J 
if(ndO) { a++; bt+; } } assert(a > 100 > b = c); 
else { c++; d++; } assert(a > 0); 
} 
assert(a < c > b < d); 
(a) myopic generalization (b) excessive generalization (c) stuck in a rut 


Fig. 1. Verification tasks to illustrate sources of divergence for SPACER. The call nd() 
non-deterministically returns a Boolean value. 


Myopic Generalization. SPACER diverges on the example in Fig. 1(a) by itera- 
tively learning lemmas of the form (a — c € k) > (b— d < k) for different values 
of k, where a, b, c, d are the program variables. These lemmas establish that 
there are no counterexamples of longer and longer lengths. However, the process 
never converges to the desired lemma (a — c) < (b — d), which excludes coun- 
terexamples of any length. The lemmas are discovered using interpolation, based 
on proofs found by the SMT-solver. A close examination of the corresponding 
proofs shows that the relationship between (a — c) and (b — d) does not appear 
in the proofs, making it impossible to find the desired lemma by tweaking local 
interpolation reasoning. On the other hand, looking at the global proof (i.e., 
the set of lemmas discovered to refute a bounded counterexample), it is almost 
obvious that (a — c) € (b — d) is an interesting generalization to try. Amusingly, 
a small, syntactic, but semantic preserving change of swapping line 2 for line 3 
in Fig. 1(a) changes the SMT-solver proofs, affects local interpolation, and makes 
the instance trivial for SPACER. 


Excessive (Predecessor) Generalization. SPACER diverges on the example 
in Fig. 1(b) by computing an infinite sequence of lemmas of the form a+ kı x b > 
k2, where a and b are program variables, and kı and kg are integers. The root 
cause is excessive generalization in predecessor computation. The Bad states 
are a < 0, and their predecessors are states such as (a = 1 ^b = —10), 
(a = 2 ^b = —10), etc., or, more generally, regions (a +b < 0), (a 4- 2b < —1), 
etc. SPACER always attempts to compute the most general predecessor states. 
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This is the best local strategy, but blocking these regions by learning their nega- 
tion leads to the aforementioned lemmas. According to the global proof these 
lemmas do not converge to a linear invariant. An alternative strategy that under- 
approximates the problematic regions by (numerically) simpler regions and, as 
a result, learns simpler lemmas is desired (and is effective on this example). For 
example, region a + 3b < —4 can be under-approximated by a < 32^ b < —12, 
eventually leading to a lemma b > 0, that is a part of the final invariant: 
(a 2 0^b2 0). 


Stuck in a Rut. Finally, SPACER converges on the example in Fig. 1(c), but only 
after unrolling the system for 100 iterations. During the first 100 iterations, 
SPACER learns that program states with (a > 100 ^ b Æ c) are not reachable 
because a is bounded by 1 in the first iteration, by 2 in the second, and so 
on. In each iteration, the global proof is updated by replacing a lemma of the 
form a « k by lemma of the form a « (k 4- 1) for different values of k. Again, 
the strategy is good locally — total number of lemmas does not grow and the 
bounded proof is improved. Yet, globally, it is clear that no progress is made 
since the same set of bad states are blocked again and again in slightly different 
ways. An alternative strategy is to abstract the literal a > 100 from the formula 
that represents the bad states, and, instead, conjecture that no states in b zz c 
are reachable. 


Our Approach: Global Guidance. As shown in the examples above, in all the 
cases that SPACER diverges, the missteps are not obvious locally, but are clear 
when the overall proof is considered. We propose three new rules, Subsume, 
Concretize, and, Conjecture, that provide global guidance, by considering exist- 
ing lemmas, to mitigate the problems illustrated above. Subsume introduces a 
lemma that generalizes existing ones, Concretize under-approximates partially- 
blocked predecessors to focus on repeatedly unblocked regions, and Conjecture 
over-approximates a predecessor by abstracting away regions that are repeatedly 
blocked. The rules are generic, and apply to arbitrary SMT theories. Further- 
more, we propose an efficient instantiation of the rules for the theory Linear 
Integer Arithmetic. 

We have implemented the new strategy, called GSPACER, in SPACER and 
compared it to the original implementation of SPACER. We show that GSPACER 
outperforms SPACER in benchmarks from CHC-COMP 2018 and 2019. More sig- 
nificantly, we show that the performance is independent of interpolation. While 
SPACER is highly dependent on interpolation parameters, and performs poorly 
when interpolation is disabled, the results of GSPACER are virtually unaffected 
by interpolation. We also compare GSPACER to LinearArbitrary [28], a tool that 
infers invariants using global reasoning. GSPACER outperforms LinearArbitrary 
on the benchmarks from [28]. These results indicate that global guidance miti- 
gates the shortcomings of local reasoning. 

The rest of the paper is structured as follows. Sect. 2 presents the necessary 
background. Sect. 3 introduces our global guidance as a set of abstract inference 
rules. Sect. 4 describes an instantiation of the rules to Linear Integer Arithmetic 
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(LIA). Sect. 5 presents our empirical evaluation. Finally, Sect. 7 describes related 
work and concludes the paper. 


2 Background 


Logic. We consider first order logic modulo theories, and adopt the standard 
notation and terminology. A first-order language modulo theory 7 is defined 
over a signature X that consists of constant, function and predicate symbols, 
some of which may be interpreted by T. As always, terms are constant symbols, 
variables, or function symbols applied to terms; atoms are predicate symbols 
applied to terms; literals are atoms or their negations; cubes are conjunctions of 
literals; and clauses are disjunctions of literals. Unless otherwise stated, we only 
consider closed formulas (i.e., formulas without any free variables). As usual, we 
use sets of formulas and their conjunctions interchangeably. 


MBP. Given a set of constants v, a formula y and a model M } y, Model Based 
Projection (MBP) of » over the constants v, denoted MBP(v, v, M), computes 
a model-preserving under-approximation of p projected onto X \ v. That is, 
MBP(v, o, M) is a formula over X \ v such that M E MBP(v, y, M) and any 
model M’ |= MBP(v, e, M) can be extended to a model M" E: o by providing 
an interpretation for v. There are polynomial time algorithms for computing 
MBP in Linear Arithmetic [5,18]. 


Interpolation. Given an unsatisfiable formula A ^ B, an interpolant, denoted 
ITP(A, B), is a formula I over the shared signature of A and B such that 
A — I and I > «B. 


Safety Problem. A transition system is a pair (Init, Tr), where Init is a formula 
over X and Tr is a formula over X U X', where X' = (s' | s € X).! The states 
of the system correspond to structures over X, Init represents the initial states 
and Tr represents the transition relation, where X is used to represent the pre- 
state of a transition, and X' is used to represent the post-state. For a formula 
p over X, we denote by y’ the formula obtained by substituting each s € X 
by s' € X’. A safety problem is a triple (Init, Tr, Bad), where (Init, Tr) is a 
transition system and Bad is a formula over X representing a set of bad states. 

The safety problem (Init, Tr, Bad) has a counterezample of length k if the 
following formula is satisfiable: Init? ^ AC Tr? ^ Bad" , where y’ is defined over 
X* = {st | s € X) (a copy of the signature used to represent the state of the 
system after the execution of i steps) and is obtained from ọ by substituting 
each s € X by s! € X, and Tr? is obtained from Tr by substituting s € X by 
s! € X! and s’ € X' by s'*! € X**, The transition system is safe if the safety 
problem has no counterexample, of any length. 


1 In fact, a primed copy is introduced in X' only for the uninterpreted symbols in X. 
Interpreted symbols remain the same in X'. 
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Algorithm 1: SPACER algorithm as a set of guarded commands. We use the 
shorthand F(y) — U' v (yA Tr). 
function SPACER: 


In: (Init, Tr, Bad) 
Out: (SAFE, Inv) or UNSAFE 


Q:=0 // pob queue 
N :=0 // maximum safe level 
Oo := Init, O; := T for alli 50 // lemma trace 
U := Init // reachable states 


forever do 
Candidate [ IsSaT(On ^ Bad) ] Q := QU (Bad, N} 
Predecessor [ (p, i --1) € Q M [E O; ^ Tr ^ e' ] Q:=QU (MBP(z', Tr ^ ¢’, M), i) 
Successor | (y, i 4-1) € Q, MEFU) ^ e' ] U :=U v MBP(z, F(U), M)[xz' — a] 
Conflict [ (o, i+ 1) € Q, £(O;) > ay’ ] O; := (O; AITP(F (0:1), y’)[z’ 9^ z]) for allj <i4+1 
Induction [ £ € Oi41,7 = (p V v), F(p^O;) e] O; :=O; ^c forallj X ic 1 
Propagate [| £ € Oi, O; ^ Tr > £' ] Oii := (Oi41 ^£) 
Unfold [ On > Bad J N:=N +1 
Safe | Oi41 — O; for some i < N ] return (SAFE, O;) 
Unsafe [ IsSaT(Bad AU) ] return UNSAFE 


Algorithm 2: Global guidance rules for SPACER. 


Subsume [| £ C Oi, k > i,F(Ox) > Y, VEEL vse] 
O; := (Oj Aw) for all j <k+1 

Concretize [ £ C Oi, (p, j) € Q, V£ € L.1ISSAT( A ^£), ISSAT(p A A C), y > p, ISSAT(V¥A AL) ] 
Q:—QU(y,k--1) where k = max{j | Oj > 77} 

Conjecture [ £ C Oi, (p, j) EQ, p= « ^B, YLE L.L => ABA ISSAT(LA a), U > ~a ] 
Q:— QU (a,k+1) where k = max{j | Oj > 7a} 


Inductive Invariants. An inductive invariant is a formula Inv over X such that 
(i) Init > Inv, (ii) Inv ^ Tr = Inv’, and (iii) Inv 2 ^Bad. If such an inductive 
invariant exists, then the transition system is safe. 


Spacer. The safety problem defined above is an instance of a more general prob- 
lem, CHC-SAT, of satisfiability of Constrained Horn Clauses (CHC). SPACER is 
a semi-decision procedure for CHC-SAT. However, to simplify the presentation, 
we describe the algorithm only for the particular case of the safety problem. We 
stress that SPACER, as well as the developments of this paper, apply to the more 
general setting of CHCs (both linear and non-linear). We assume that the only 
uninterpreted symbols in X are constant symbols, which we denote a. Typically, 
these represent program variables. Without loss of generality, we assume that 
Bad is a cube. 

Algorithm 1 presents the key ingredients of SPACER as a set of guarded 
commands (or rules). It maintains the following. Current unrolling depth N at 
which a counterexample is searched (there are no counterexamples with depth 
less than N). A trace O = (Oo, O1,...) of frames, such that each frame O; is a 
set of lemmas, and each lemma £ € O; is a clause. A queue of proof obligations 
Q, where each proof obligation (POB) in Q is a pair (vy, i) of a cube ọ and a level 
number i, 0 € i € N. An under-approximation U of reachable states. Intuitively, 
each frame O; is a candidate inductive invariant s.t. ©; over-approximates states 
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reachable up to i steps from Init. The latter is ensured since Oo = Init, the trace 
is monotone, i.e., O;,; C O;, and each frame is inductive relative to its previous 
one, i.e., O; ^ Tr — Or ,,. Each POB (v, i) in Q corresponds to a suffix of a 
potential counterexample that has to be blocked in Ó;, i.e., has to be proven 
unreachable in i steps. 

The Candidate rule adds an initial POB (Bad, N) to the queue. If a POB (t, i) 
cannot be blocked because ¢ is reachable from frame (i — 1), the Predecessor 
rule generates a predecessor v of y using MBP and adds (w,i — 1) to Q. The 
Successor rule updates the set of reachable states if the POB is reachable. If the 
POB is blocked, the Conflict rule strengthens the trace O by using interpolation 
to learn a new lemma £ that blocks the POB, i.e., £ implies ^y. The Induction 
rule strengthens a lemma by inductive generalization and the Propagate rule 
pushes a lemma to a higher frame. If the Bad state has been blocked at N, 
the Unfold rule increments the depth of unrolling N. In practice, the rules are 
scheduled to ensure progress towards finding a counterexample. 


3 Global Guidance of Local Proofs 


As illustrated by the examples in Fig. 1, while SPACER is generally effective, its 
local reasoning is easily confused. The effectiveness is very dependent on the 
local computation of predecessors using model-based projection, and lemmas 
using interpolation. In this section, we extend SPACER with three additional 
global reasoning rules. The rules are inspired by the deficiencies illustrated by 
the motivating examples in Fig. 1. In this section, we present the rules abstractly, 
independent of any underlying theory, focusing on pre- and post-conditions. In 
Sect.4, we specialize the rules for Linear Integer Arithmetic, and show how 
they are scheduled with the other rules of SPACER in an efficient verification 
algorithm. The new global rules are summarized in Algorithm 2. We use the 
same guarded command notation as in description of SPACER in Algorithm 1. 
Note that the rules supplement, and not replace, the ones in Algorithm 1. 


Subsume is the most natural rule to explain. It says that if there is a set of 
lemmas £ at level i, and there exists a formula 7 such that (a) w is stronger 
than every lemma in £, and (b) w over-approximates states reachable in at most 
k steps, where k > i, then w can be added to the trace to subsume £. This rule 
reduces the size of the global proof — that is, the number of total not-subsumed 
lemmas. Note that the rule allows ~ to be at a level k that is higher than i. The 
choice of v is left open. The details are likely to be specific to the theory involved. 
For example, when instantiated for LIA, Subsume is sufficient to solve example 
in Fig. 1(a). Interestingly, Subsume is not likely to be effective for propositional 
IC3. In that case, v/ is a clause and the only way for it to be stronger than £ is 
for w to be a syntactic sub-sequence of every lemma in £, but such v is already 
explored by local inductive generalization (rule Induction in Algorithm 1). 
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Concretize applies to a POB, unlike Subsume. It is motivated by example in 
Fig. 1(b) that highlights the problem of excessive local generalization. SPACER 
always computes as general predecessors as possible. This is necessary for refu- 
tational completeness since in an infinite state system there are infinitely many 
potential predecessors. Computing the most general predecessor ensures that 
SPACER finds a counterexample, if it exists. However, this also forces SPACER to 
discover more general, and sometimes more complex, lemmas than might be nec- 
essary for an inductive invariant. Without a global view of the overall proof, it 
is hard to determine when the algorithm generalizes too much. The intuition for 
Concretize is that generalization is excessive when there is a single POB (y, j) 
that is not blocked, yet, there is a set of lemmas £ such that every lemma £ € £ 
partially blocks y. That is, for any £ € £, there is a sub-region ye of POB y that 
is blocked by £ (i.e., / > —,), and there is at least one state s € y that is not 
blocked by any existing lemma in £ (i.e., s EF pA AZL). In this case, Concretize 
computes an under-approximation y of y that includes some not-yet-blocked 
state s. The new POB is added to the lowest level at which y is not yet blocked. 
Concretize is useful to solve the example in Fig. 1(b). 


Conjecture guides the algorithm away from being stuck in the same part of the 
search space. A single POB y might be blocked by a different lemma at each level 
that y appears in. This indicates that the lemmas are too strong, and cannot 
be propagated successfully to a higher level. The goal of the Conjecture rule is 
to identify such a case to guide the algorithm to explore alternative proofs with 
a better potential for generalization. This is done by abstracting away the part 
of the POB that has been blocked in the past. The pre-condition for Conjecture 
is the existence of a POB (y,j) such that ọ is split into two (not necessarily 
disjoint) sets of literals, a and 8. Second, there must be a set of lemmas £, at a 
(typically much lower) level i < j such that every lemma £ € £ blocks v, and, 
moreover, blocks y by blocking Ø. Intuitively, this implies that while there are 
many different lemmas (i.e., all lemmas in £) that block ọ at different levels, all 
of them correspond to a local generalization of ^ that could not be propagated 
to block ọ at higher levels. In this case, Conjecture abstracts the POB q into 
a, hoping to generate an alternative way to block y. Of course, œ is conjectured 
only if it is not already blocked and does not contain any known reachable states. 
Conjecture is necessary for a quick convergence on the example in Fig. 1(c). In 
some respect, Conjecture is akin to widening in Abstract Interpretation [12] 
— it abstracts a set of states by dropping constraints that appear to prevent 
further exploration. Of course, it is also quite different since it does not guarantee 
termination. While Conjecture is applicable to propositional IC3 as well, it is 
much more significant in SMT-based setting since in many FOL theories a single 
literal in a POB might result in infinitely many distinct lemmas. 

Each of the rules can be applied by itself, but they are most effective in 
combination. For example, Concretize creates less general predecessors, that, in 
the worst case, lead to many simple lemmas. At the same time, Subsume combines 
lemmas together into more complex ones. The interaction of the two produces 
lemmas that neither one can produce in isolation. At the same time, Conjecture 
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helps unstuck the algorithm from a single unproductive POB, allowing the other 
rules to take effect. 


4 Global Guidance for Linear Integer Arithmetic 


In this section, we present a specialization of our general rules, shown in 
Algorithm 2, to the theory of Linear Integer Arithmetic (LIA). This requires 
solving two problems: identifying subsets of lemmas for pre-conditions of the 
rules (clearly using all possible subsets is too expensive), and applying the rule 
once its pre-condition is met. For lemma selection, we introduce a notion of syn- 
tactic clustering based on anti-unification. For rule application, we exploit basic 
properties of LIA for an effective algorithm. Our presentation is focused on 
LIA exclusively. However, the rules extend to combinations of LIA with other 
theories, such as the combined theory of LIA and Arrays. 

The rest of this section is structured as follows. We begin with a brief back- 
ground on LIA in Sect. 4.1. We then present our lemma selection scheme, which 
is common to all the rules, in Sect. 4.2, followed by a description of how the rules 
Subsume (in Sect. 4.3), Concretize (in Sect. 4.4), and Conjecture (in Sect. 4.5) 
are instantiated for LIA. We conclude in Sect. 4.6 with an algorithm that inte- 
grates all the rules together. 


4.1 Linear Integer Arithmetic: Background 


In the theory of Linear Integer Arithmetic (LIA), formulas are defined over a 


signature that includes interpreted function symbols +, —, x, interpreted predi- 
cate symbols «, <, |, interpreted constant symbols 0,1,2,..., and uninterpreted 
constant symbols a,b,...,2,y,.... We write Z for the set interpreted constant 


symbols, and call them integers. We use constants to refer exclusively to the unin- 
terpreted constants (these are often called variables in LIA literature). Terms 
(and accordingly formulas) in LIA are restricted to be linear, that is, multipli- 
cation is never applied to two constants. 

We write LIA *'* for the fragment of LIA that excludes divisiblity (d|h) 
predicates. A literal in LIAT is a linear inequality; a cube is a conjunction of 
such inequalities, that is, a polytope. We find it convenient to use matrix-based 
notation for representing cubes in LIATY. A ground cube c € LIA *'* with p 
inequalities (literals) over k (uninterpreted) constants is written as A- æ < m, 
where A is a p x k matrix of coefficients in ZP*^, x = (x1 +-+ &p)™ is a column 
vector that consists of the (uninterpreted) constants, and n = (n4:::nj)T is a 
column vector in ZP. For example, the cube x > 2A 2x+y < 3 is written as 
[3 9]-[2] € [7 2]. In the sequel, all vectors are column vectors, super-script T 
denotes transpose, dot is used for a dot product and [n1; n2] stands for a matrix 
of column vectors n; and n». 
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4.2 Lemma Selection 


A common pre-condition for all of our global rules in Algorithm 2 is the existence 
of a subset of lemmas £ of some frame O;. Attempting to apply the rules for every 
subset of O; is infeasible. In practice, we use syntactic similarity between lemmas 
as a predictor that one of the global rules is applicable, and restrict £ to subsets 
of syntactically similar lemmas. In the rest of this section, we formally define 
what we mean by syntactic similarity, and how syntactically similar subsets of 
lemmas, called clusters, are maintained efficiently throughout the algorithm. 


Syntactic Similarity. A formula m with free variables is called a pattern. Note 
that we do not require m to be in LIA. Let c be a substitution, i.e., a mapping 
from variables to terms. We write mo for the result of replacing all occurrences 
of free variables in 7 with their mapping under c. A substitution c is called 
numeric if it maps every variable to an integer, i.e., the range of o is Z. We 
say that a formula y numerically matches a pattern 7 iff there exists a numeric 
substitution ø such that y = mo. Note that, as usual, the equality is syntactic. 
For example, consider the pattern 7 = voa + vb < 0 with free variables vy and 
vı and uninterpreted constants a and b. The formula yı = 3a + 40 < 0 matches 
7 via a numeric substitution o1 = (vo — 3, v1 + 4). However, yo = 4b+ 3a < 0, 
while semantically equivalent to 1, does not match r. Similarly o3 — a 4-6 <0 
does not match 7 as well. 

Matching is extended to patterns in the usual way by allowing a substitution 
c to map variables to variables. We say that a pattern 7, is more general than 
a pattern m2 if m2 matches mı. A pattern m is a numeric anti-unifier for a 
pair of formulas y; and 2 if both yı and p2 match m numerically. We write 
anti((~1, 2) for a most general numeric anti-unifier of yı and y2. We say that 
two formulas yı and yə are syntactically similar if there exists a numeric anti- 
unifier between them (i.e., anti(q1, 2) is defined). Anti-unification is extended 
to sets of formulas in the usual way. 


Clusters. We use anti-unification to define clusters of syntactically similar for- 
mulas. Let ® be a fixed set of formulas, and « a pattern. A cluster, Ca (n), is 
a subset of ® such that every formula y € Ca(z) numerically matches m. That 
is, 7 is a numeric anti-unifier for Ca(). In the implementation, we restrict the 
pre-conditions of the global rules so that a subset of lemmas £ C 0; is a cluster 
for some pattern 7, i.e., L = Co,(x). 


Clustering Lemmas. We use the following strategy to efficiently keep track of 
available clusters. Let fnew be a new lemma to be added to O;. Assume there is at 
least one lemma / € O; that numerically anti-unifies with Znew via some pattern 
7. If such an £ does not belong to any cluster, a new cluster Co,(7) = {lnew, £} 
is formed, where m = anti(lnew, £). Otherwise, for every lemma ¢ € O; that 
numerically matches new and every cluster Co,(7) containing £, Znew is added 
to Co, (t) if new matches 7, or a new cluster is formed using £, Znew, and any 
other lemmas in Co, (4) that anti-unify with them. Note that a new lemma new 
might belong to multiple clusters. 
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For example, suppose fnew = (a < 6 V b € 6), and there is already a cluster 
Co,(a<uVb<5)={(a<5Vb<5),(a<8Vb< 5)}. Since Crew anti-unifies 
with each of the lemmas in the cluster, but does not match the pattern a < 
vo Vb < 5, a new cluster that includes all of them is formed w.r.t. a more general 
pattern: Co,(a € up Vb < v) = ((a < 6Vb < 6), (a < 5vb < 5), (a < 8Vb < 5)}. 

In the presentation above, we assumed that anti-unification is completely 
syntactic. This is problematic in practice since it significantly limits the applica- 
bility of the global rules. Recall, for example, that a+b < 0 and 2a+2b < 0 do not 
anti-unify numerically according to our definitions, and, therefore, do not cluster 
together. In practice, we augment syntactic anti-unification with simple rewrite 
rules that are applied greedily. For example, we normalize all LIA terms, take 
care of implicit multiplication by 1, and of associativity and commutativity of 
addition. In the future, it is interesting to explore how advanced anti-unification 
algorithms, such as [8,27], can be adapted for our purpose. 


4.3 Subsume Rule for LIA 


Recall that the Subsume rule (Algorithm 2) takes a cluster of lemmas £ = Co, (7) 
and computes a new lemma w that subsumes all the lemmas in £, that is y > 
A £. We find it convenient to dualize the problem. Let S = {~£ | £ € L} be the 
dual of £, clearly  — A L iff (V S) = ^v. Note that £ is a set of clauses, S is a 
set of cubes, w is a clause, and ^v is a cube. In the case of LIA-?'*, this means 
that V S represents a union of convex sets, and — represents a convex set that 
the Subsume rule must find. The strongest such ^v in LIA '?'" exists, and is the 
convex closure of S. Thus, applying Subsume in the context of LIATY is reduced 
to computing a convex closure of a set of (negated) lemmas in a cluster. Full 
LIA extends LIA?" with divisibility constraints. Therefore, Subsume obtains 
a stronger 77 by adding such constraints. 


Example 1. For example, consider the following cluster: 


L—((x»2Vx«2vVy»3,(v»4Vx«4AVy»5),(r»8Vr«8vy»9) 
S={(a@<2Ar>2Ay<3),(@>4Aa<4Ay <5), (@>8Ar<8BAY<9)} 


The convex closure of S in LIATY is 2 <x <8A^y < x+1. However, a stronger 


over-approximation exists in LIA: 2 € z X 8^y € z - 1^(2 |). 


In the sequel, we describe SUBSUMECUBE (Algorithm 3) which computes a 
cube ¢ that over-approximates (V S). Subsume is then implemented by removing 
from £ lemmas that are already subsumed by existing lemmas in £, dualizing 
the result into S, invoking SUBSUMECUBE on S and returning — as a lemma 
that subsumes £. 

Recall that Subsume is tried only in the case £ = Co,(7). We further require 
that the negated pattern, 47, is of the form A: x < v, where A is a coefficients 
matrix, x is a vector of constants and v = (vi:::vy)7T is a vector of p free 
variables. Under this assumption, S (the dual of £) is of the form {(A-a < mi) | 
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1 < i < q}, where q = |S|, and for each 1 € i € q, n; is a numeric substitution 
to v from which one of the negated lemmas in S is obtained. That is, |n;| = |v]. 
In Example 1, ^7 = x < vı ^ —x: € v ^ y € v3 and 


1 0 Ui 2 4 8 
A-—|-1 0 zr = M v= |vo nı = |—2 Nn = |—4 n3 = |—8 
0 1 Y v3 3 5 9 


Each cube (A: æ € mj) € S is equivalent to Jv. A- » < vA (v = nj). 

Finally, (V S) = 3v.(A: x € v) ^(V(v = n;)). Thus, computing the over- 
approximation of S is reduced to (a) computing the convex hull H of a set 
of points (n; | 1 € i € q}, (b) computing divisibility constraints D that are 
satisfied by all the points, (c) substituting H ^ D for the disjunction in the 
equation above, and (c) eliminating variables v. Both the computation of H A D 
and the elimination of v may be prohibitively expensive. We, therefore, over- 
approximate them. Our approach for doing so is presented in Algorithm 3, and 
explained in detail below. 
Computing the convex hull of (n; |1 € i € q}. lines 3 to 8 compute the convex 
hull of (n; | 1 € i € qj as a formula over v, where variable v;, for 1 € j < p, 
represents the jt? coordinates in the vectors (points) n;. Some of the coordinates, 
vj, in these vectors may be linearly dependent upon others. To simplify the 
problem, we first identify such dependencies and compute a set of linear equalities 
that expresses them (ZL in line 4). To do so, we consider a matrix Ng», where the 
it! row consists of n7. The j*^ column in N, denoted N,;, corresponds to the j^ 
coordinate, v;. The rank of N is the number of linearly independent columns (and 
rows). The other columns (coordinates) can be expressed by linear combinations 
of the linearly independent ones. To compute these linear combinations we use 
the kernel of [N; 1|] (N appended with a column vector of 1’s), which is the 
set of all vectors y such that [N;1]: y = 0, where O is the zero vector. Let 
B = kernel([N; 1]) be a basis for the kernel of [N; 1]. Then |B] = p — rank(N), 
and for each vector y € B, the linear equality [vi--: vj 1]: y = 0 holds in 
all the rows of N (i.e., all the given vectors satisfy it). We accumulate these 
equalities, which capture the linear dependencies between the coordinates, in 
L. Further, the equalities are used to compute rank(N) coordinates (columns 
in N) that are linearly independent and, modulo L, uniquely determine the 
remaining coordinates. We denote by v^! the subset of v that consists of the 
linearly independent coordinates. We further denote by ny! the projection of 
nj to these coordinates and by N^ the projection of N to the corresponding 
columns. We have that (V(v = nj)) = LA(V(v = ni), 


2-2 8 
In Example 1, the numeral matrix is N = É -4 h for which 
kernel([N;1]) = ((1100)7,(10 -11)7). Therefore, L is the conjunction of 
equalities vı + v2 = 0 A v1 — v3 + 1 = 0, or, equivalently v3 = v1 + 1 A v2 = —u1, 


vh = (v1)*, and 


Global Guidance for Local Generalization in Model Checking 113 


2 
nit =f] ny =j] nz-[8 Ne = j4 


[o9] 


Next, we compute the convex closure of V(v^: = ni), and conjoin it with 
L to obtain H, the convex closure of (V(v = ni)). 

If the dimension of v^ is one, as is the case in the example above, convex 
closure, C, of V(v™ = ny) is obtained by bounding the sole element of v^: 
based on its values in N^ (line 6). In Example 1, we obtain C = 2 < v, < 8. 

If the dimension of v^ is greater than one, just computing the bounds of 
one of the constants is not sufficient. Instead, we use the concept of syntactic 
convex closure from [2] to compute the convex closure of V(v^: = n?) as Ja. C 
where « is a vector that consists of q fresh rational variables and C is defined 
as follows (line 8): C = à > 0^ Za = 1^oT - NT = (v"!)". C states that 
(v7) is a convex combination of the rows of N^, or, in other words, v^ is a 


convex combination of (n^ BESTES 
'To illustrate the syntactic convex closure, consider a second example with a 


set of cubes: S = ((z € OAy € 6), (x € 6Ay € 0), (x € 5Ay € 5)}. The coefficient 


matrix A, and the numeral matrix N are then: A = [19] and N = E 8]. 


Here, kernel([N; 1]) is empty — all the columns are linearly independent, hence, 
L = true and v! = v. Therefore, syntactic convex closure is applied to the full 
matrix N, resulting in 


C = (a, > 0) A (ag > 0) A (a3 2 0) A (a1 +a2 +03 — 1) ^ 
(6a2 + 505 = v1) ^ (604 + 5o = V2) 


The convex closure of V(v = n;) is then L A da. C, which is Ja. C here. 


Divisibility Constraints. Inductive invariants for verification problems often 
require divisibility constraints. We, therefore, use such constraints, denoted D, 
to obtain a stronger over-approximation of V(v = nj) than the convex closure. 
To add a divisibility constraint for v; € v”!, we consider the column NO that 
corresponds to v; in N Li, We find the largest positive integer d such that each 
integer in NO leaves the same remainder when divided by d; namely, there exists 


0 € r < dsuch that n mod d = r for every n € NO This means that d | (v; — r) 
is satisfied by all the points n;. Note that such r always exists for d = 1. To 
avoid this trivial case, we add the constraint d | (v; — r) only if d Z 1 (line 12). 
We repeat this process for each v; € vt, 

In Example 1, all the elements in the (only) column of the matrix N^, which 
corresponds to v;, are divisible by 2, and no larger d has a corresponding r. Thus, 
line 12 of Algorithm 3 adds the divisibility condition (2 | v1) to D. 
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Eliminating Existentially Quantified Variables Using MBP. By combining the 
linear equalities exhibited by N, the convex closure of N^: and the divisibility 
constraints on v, we obtain Ja. L ^ C ^ D as an over-approximation of V(v = 
ni). Accordingly, Iv. Ja. y, where y = (A: a < v)ALACA D, is an over- 
approximation of (V S) = 3v. (4:2 € v)A^(V(v = n;)) (line 13). In order to get 
a LIA cube that overapproximates V S, it remains to eliminate the existential 
quantifiers. Since quantifier elimination is expensive, and does not necessarily 
generate convex formulas (cubes), we approximate it using MBP. Namely, we 
obtain a cube y that under-approximates dv. Jo. v by applying MBP on v and 
a model Mo — v. We then use an SMT solver to drop literals from ọ until it 
over-approximates dv. Ja. y, and hence also V S (lines 16 to 19). The result is 
returned by Subsume as an over-approximation of V S. 

Models Mp that satisfy 7% and do not satisfy any of the cubes in S are 
preferred when computing MBP (line 14) as they ensure that the result of MBP 
is not subsumed by any of the cubes in S. 

Note that the œ are rational variables and v are integer variables, which 
means we require MBP to support a mixture of integer and rational variables. To 
achieve this, we first relax all constants to be rationals and apply MBP over LRA 
to eliminate a. We then adjust the resulting formula back to integer arithmetic 
by multiplying each atom by the least common multiple of the denominators of 
the coefficients in it. Finally, we apply MBP over the integers to eliminate v. 

Considering Example 1 again, we get that  — (x < v) A (=x € v3) A (y € 
v3) A (v3 = 1 +v) A (vg = —v1) A (2 € vı € 8A(2 | v1) (the first three conjuncts 
correspond to (A- (x y)T € (vı v2 v3)7)). Note that in this case we do not have 
rational variables « since |v^: | = 1. Depending on the model, the result of MBP 
can be one of 


y€rd41A^A2€rE€8A(2|y—1)^(2|a) y22^xX2^y&€3 
y X z--1A2XmX8A(2|zm) LT>8Ar<8AYK<9 
y>atlAy<z+1A3<y<9A(Ql|y-1) 


However, we prefer a model that does not satisfy any cube in S = ((x > 2Aa< 
2ANy <3), (@<4An>4Ay <5), (a <8An>8Ay <9), rules off the two 
possibilities on the right. None of these cubes cover w, hence generalization is 
used. 

If the first cube is obtained by MBP, it is generalized into y < x+1Aa> 
2A a < 8A (2|x); the second cube is already an over-approximation; the third 
cube is generalized into y < x +1 ^y < 9. Indeed, each of these cubes over- 
approximates V S. 


4.4 Concretize Rule for LIA 


The Concretize rule (Algorithm 2) takes a cluster of lemmas £ = Co,(m) and a 
POB (y, j) such that each lemma in £ partially blocks y, and creates a new POB y 
that is still not blocked by £, but y is more concrete, i.e., y => y. In our implemen- 
tation, this rule is applied when øy is in LIA“, We further require that the pat- 
tern, 77, of £ is non-linear, i.e., some of the constants appear in 7 with free variables 
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Algorithm 3: An implementation of 


the Subsume rule for the dual of a cluster 
S={A-a<n;|1<i< gq}. 


1 function SUBSUMECUBE: 
In: S= {(A-@ <nj)|1<i< q}, 
Out: An over-approximation of (V S). 
/* v are integer variables such that: 
(VS) = 3v.(A. 2 Xv)^A(Vv—mi) 
2 N= [nii ina]? 
/* Compute the set of linear dependencies 


Algorithm 4: An implementation of 


the Concretize rule in LIA. 


1 function CONCRETIZE: 

In: A pos (i, j) in LIATY, a cluster of 
LIA -?'Y lemmas £ = Co, (T) s.t. m is 
non-linear, ISSAT(q ^ A £) 

Out: A cube y such that y > v and 

VE € L.ISSAT(y ^ £) 
U :— {a | COEFF(x, 7) € VARS(7) } 


115 


implied by N */ 
3 B := kernel([N; 1]) 
a L:i-ÁAyeg(1 7 "»1):y =0 
5 if |v"1| 2 1 then 
// Convex closure over a single constant v; C vi 


find M s.t. ME Y~AAL 
y= T 
foreach lit € y do 
if Consts(lit) N U z Ø then 
y := y ^ CONCRETIZE_LIT(Lit, M, U) 
7 else y:=yA lit 


6 C :— min(N,;) € v; € max(N.;) By s RM-SUBSUME(^) 
9 return y 


Qapwon 


7 else 
// Syntactic convex closure 10 function CONCRETIZE_LIT: 
s C2 (aT. N41 = (v"!)T)A(Sa = 1)A(a > 0) In: A literal lit = Dinix; € bj in LIA v, 
/* Compute divisibility constraints */ model M } lit, and a set of constants U 
9 D:=T Out: A cube y!" that concretizes lit 
L /* Construct a single literal using all the 
Bi x wee de constants in Consts(lit) NU */ 
L| ii y#:=0 
Jd, r.d # 1^(Vn € N,5. (n mod d = r)) then 12 33D 
12 D:-D^d|(vi —r) 13 foreach x; € Consts(lit) NU do 
13 w:=(A-a<v)ALACAD 14  8:— SF niti 


/* Under-approximate quantifier elimination */ 
14 find Mo s.t. Mo |= v and, if possible, Mo EE (V S) 
15 y:= MBP((a v), v, Mo) 


15 s" := (s < M[s]) 
/* Generate one dimensional literals for each 
constant in U */ 


MdL Mm T elimination */ 16 foreach x; € Consts(lit) NU do 

16 while IsSAT(7= o it n 4 

us Bad Mist ME CEAI) w= y" A (nizi < M[nizi]) 
ps p lit 

18 Q:— A(£€ e| 5(Mi E 72)} 18 return y 


19 return o 


as their coefficients. We denote these constants by U. An example is the pattern 
T = vot + vy + z € 0, where U = {x,y}. Having such a cluster is an indication 
that attempting to block y in full with a single lemma may require to track non- 
linear correlations between the constants, which is impossible to do in LIA. In such 
cases, we identify the coupling of the constants in U in POBs (and hence in lemmas) 
as the potential source of non-linearity. Hence, we concretize (strengthen) q into 
a POB y where the constants in U are no longer coupled to any other constant. 


Coupling. Formally, constants u and v are coupled in a cube c, denoted u pa. v, 
if there exists a literal lit in c such that both u and v appear in lit (i.e., their 
coefficients in lit are non-zero). For example, x and y are coupled in «+ y < 
0A z € 0 whereas neither of them are coupled with z. A constant u is said to 
be isolated in a cube c, denoted Iso(u, c), if it appears in c but it is not coupled 
with any other constant in c. In the above cube, z is isolated. 


Concretization by Decoupling. Given a POB q« (a cube) and a cluster £, 
Algorithm 4 presents our approach for concretizing y by decoupling the con- 
stants in U—those that have variables as coefficients in the pattern of £ (line 2). 
Concretization is guided by a model M |= yA A L, representing a part of ọ that 
is not yet blocked by the lemmas in £ (line 3). Given such M, we concretize p 
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into a model-preserving under-approximation that isolates all the constants in 
U and preserves all other couplings. That is, we find a cube y, such that 


y> MEy VucU.Iso(uy) Vuv&gU.(ura, v) > (um, v) (1) 


Note that y is not blocked by £ since M satisfies both A £ and y. For example, 
if y = (x+y < 0)^A (x—y < 0)A(a@+z> 0) and M = |x =0,y = 0, z = 1], then 
y=0<y<0Axz<0Az+z > 1is a model preserving under-approximation 
that isolates U = {y}. 

Algorithm 4 computes such a cube y by a point-wise concretization of the 
literals of y followed by the removal of subsumed literals. Literals that do not 
contain constants from U remain unchanged. A literal of the form lit = t < b, 
where t = $^, nix; (recall that every literal in LIATY can be normalized to this 
form), that includes constants from U is concretized into a cube by (1) isolating 
each of the summands n;z; in t that include U from the rest, and (2) for each 
of the resulting sub-expressions creating a literal that uses its value in M as a 
bound. Formally, t is decomposed to s+). Nizi, where s = E gu MX. 'The 


xıEU 
concretization of lit is the cube y** = s < M[s] ^ Pase nir; € M|[njz;], where 
M{t'| denotes the interpretation of t/ in M. Note that ^"* > lit since the bounds 
are stronger than the original bound on t: M[s] + »5, cp M[niv;] = M|t] < b. 
This ensures that y, obtained by the conjunction of literal concretizations, 
implies y. It trivially satisfies the other conditions of Eq. (1). 

For example, the concretization of the literal (x + y < 0) with respect to 
U = {y} and M = [x = 0,9 = 06,2 = 1] is the cube x < OA y € 0. Applying 
concretization in a similar manner to all the literals of the cube y = (x-Fy € 0)^ 
(z—y <0)A(a+z > 0) from the previous example, we obtain the concretization 
r<0A0<y<0A2+2z>0. Note that the last literal is not concretized as it 
does not include y. 


4.5 Conjecture Rule for LIA 


The Conjecture rule (see Algorithm 2) takes a set of lemmas £ and a POB 
p = «^f such that all lemmas in £ block 8, but none of them blocks a, where 
a does not include any known reachable states. It returns o as a new POB. 

For LIA, Conjecture is applied when the following conditions are met: (1) the 
POB c is of the form y1 ^ 92 ^ Y3, where y3 = (nT -æ < b), and yı and e» are 
any cubes. The sub-cube y1 ^ 2 acts as a, while the sub-cube q ^ o3 acts as 8. 
(2) The cluster £ consists of (bg V (nT -æ > b;) | 1 < i < q}, where b; > b and 
bg => 9». This means that each of the lemmas in £ blocks 8 = p2 ^ a, and they 
may be ordered as a sequence of increasingly stronger lemmas, indicating that they 
were created by trying to block the POB at different levels, leading to too strong 
lemmas that failed to propagate to higher levels. (3) The formula (bg V (n? - æ > 
bi)) ^ «1 ^ $2 is satisfiable, that is, none of the lemmas in £ block a = «1 ^ », 
and (4) U > 7A(y1 ^ p2), that is, no state in o1 ^ q2 is known to be reachable. If 
all four conditions are met, we conjecture a = q1 ^ p2. This is implemented by 
CONJECTURE, that returns o (or L when the pre-conditions are not met). 
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Algorithm 5: GSPACER for LIA. 


1 


function GSPACER: 
In: (Init, Tr, Bad) 
Out: An Inductive invariant or UNSAFE 
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function CONCRETIZEPOB: 


(m1, £1) := Cpoo({¥, 4)) 
Lo := {£ | £ € £4 AISSAT(LA p) AISSAT(AEA y)} 


/* Initialize state of the solver */ 27 if (Lo # Ü^NONLIN(z1) AISSAT(A £2 ^ ọ)) then 
2 Q:=0;N =0;U := Init; 28 — y :— CONCRETIZE(, (71, £2)) 
3 Of = Init; Oi := T, Vi» 0 1:9 k:=max{j|O; > -y} 
4 ENQUEUE(Q, (Bad, 0)) 30 PUSH(Q, (y, k)) // Concretize 
5 while T do 31 PUSH(Q, (4, i)) 
6 (y,%) := Por(Q) 32 return T 
7 if CONCRETIZEPOB((y, i)) = T then 33 else return L 
8 continue n 
» if ISSAT((O; 4) Ay’) then 34 function ADDPREDECBSSOR: 

// The pob « cannot be blocked at i 35^ <a ISSAT( F(U) ^ ^) then 
, 
10 ADDPREDECESSOR( (i, i)) * find Mi st My = F(U) A A 
m if IsSSAT(U A Bad) then 37 s:— (MBP(z, F(U), Mi)[z' +> æ]) 
n return UNSAFE // Unsafe 3 U:=UVs // Successor 
5 else 39 return 
// The pob y can be blocked at i 4 find MP st Mz = O 

14 BLock((w, i)) a p:— MBP(z', Tr ^ 9’, M2) 
15 for 0 < j € N do 42 PUSH(Q, (p, — 1)) // Predecessor 
16 for £ € O; \ Oj41 do 43. PUSH(Q, (v; i)) 
17 if Oj ^ Tr = l’ then 44 function BLOCK: 
a8 O41 = Oj41 NE // Propagate 45g .— GEN(F(O;-1), p’) // Conflict 
19 if 50 <j < N: Oj => Oj- then 46 for0 <j <ido O;:=O;A£ 
20 . return (SAFE, O;) // Safe y (73, £3) = Ciemma(£) 
21 if On = —Bad then 48 a := CONJECTURE(, L3, U) 
22 N:=N+1 // Unfold 4, if a # | then 
23 PUSH(Q, (Bad, N)) 50 k := max{j | Oj > 7a} 


51. PUSH(Q, (a, k)) 

52 if 5-3 = A- c < v then 

:= SUBSUME( (73, £3)) 

:= max(j | F(O;) > vj 

jo O;^vforalljczk-41 


// Conjecture 


v 
54 k 

[9] // Subsume 

For example, consider the POB y = x > 10^ (x+y 2 10)^gy € 10 and a 
cluster of lemmas £ = ((x + y < 0 V y 2 101), (x+y < 0V y > 102)}. In this 
case, q1 = x > 10, ye = (£x +y > 10), ys = y € 10, and bg = x +y € 0. Each of 
the lemmas in £ block p2 ^ a but none of them block «1 ^ ye. Therefore, we 
conjecture y1 ^ ye: x > 10 ^ (x +y 2 10). 


4.6 Putting It All Together 


Having explained the implementation of the new rules for LIA, we now put all 
the ingredients together into an algorithm, GSPACER. In particular, we present 
our choices as to when to apply the new rules, and on which clusters of lemmas 
and POBs. As can be seen in Sect. 5, this implementation works very well on a 
wide range of benchmarks. 

Algorithm 5 presents GSPACER. The comments to the right side of a line 
refer to the abstract rules in Algorithm 1 and 2. Just like SPACER, GSPACER 
iteratively computes predecessors (line 10) and blocks them (line 14) in an infi- 
nite loop. Whenever a POB is proven to be reachable, the reachable states are 
updated (line 38). If Bad intersects with a reachable state, GSPACER terminates 
and returns UNSAFE (line 12). If one of the frames is an inductive invariant, 
GSPACER terminates with SAFE (line 20). 

When a POB (9,1) is handled, we first apply the Concretize rule, if possi- 
ble (line 7). Recall that CONCRETIZE (Algorithm 4) takes as input a cluster that 
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partially blocks y and has a non-linear pattern. To obtain such a cluster, we first 
find, using Cpob( (p, i)), a cluster (m1, £1) = Co, (71), where k < i, that includes 
some lemma (from frame k) that blocks q; if none exists, Lı = Ø. We then filter 
out from £; lemmas that completely block y as well as lemmas that are irrele- 
vant to q, i.e., we obtain £3 by keeping only lemmas that partially block i. We 
apply CONCRETIZE on (71, £2) to obtain a new POB that under-approximates 
y if (1) the remaining sub-cluster, £2, is non-empty, (2) the pattern, 71, is non- 
linear, and (3) A £2 ^ ọ is satisfiable, i.e., a part of y is not blocked by any 
lemma in £2. 

Once a POB is blocked, and a new lemma that blocks it, /, is added to 
the frames, an attempt is made to apply the Subsume and Conjecture rules on 
a cluster that includes £. To that end, the function Ciemma(@) finds a cluster 
(73, £3) = Co, (T3) to which £ belongs (Sect. 4.2). Note that the choice of cluster 
is arbitrary. The rules are applied on (73, £3) if the required pre-conditions are 
met (line 49 and line 53, respectively). When applicable, SUBSUME returns a 
new lemma that is added to the frames, while CONJECTURE returns a new POB 
that is added to the queue. Note that the latter is a may POB, in the sense that 
some of the states it represents may not lead to safety violation. 


Ensuring Progress. SPACER always makes progress: as its search continues, it 
establishes absence of counterexamples of deeper and deeper depths. However, 
GSPACER does not ensure progress. Specifically, unrestricted application of the 
Concretize and Conjecture rules can make GSPACER diverge even on executions 
of a fixed bound. In our implementation, we ensure progress by allotting a fixed 
amount of gas to each pattern, 7, that forms a cluster. Each time Concretize 
or Conjecture is applied to a cluster with 7 as the pattern, 7 loses some gas. 
Whenever 7 runs out of gas, the rules are no longer applied to any cluster 
with m as the pattern. There are finitely many patterns (assuming LIA terms 
are normalized). Thus, in each bounded execution of GSPACER, the Concretize 
and Conjecture rules are applied only a finite number of times, thereby, ensuring 
progress. Since the Subsume rule does not hinder progress, it is applied without 
any restriction on gas. 


5 Evaluation 


We have implemented? GSPACER (Algorithm 5) as an extension to SPACER. To 
reduce the dimension of a matrix (in SUBSUME, Sect. 4.3), we compute pairwise 
linear dependencies between all pairs of columns instead of computing the full 
kernel. This does not necessarily reduce the dimension of the matrix to its rank, 
but, is sufficient for our benchmarks. We have experimented with computing the 
full kernel using SageMath [25], but the overall performance did not improve. 
Clustering is implemented by anti-unification. LIA terms are normalized using 


? https://github.com/hgvk94/z3/tree/ gspacer-cav-ae. 
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default Z3 simplifications. Our implementation also supports global generaliza- 
tion for non-linear CHCs. We have also extended our work to the theory of LRA. 
We defer the details of this extension to an extended version of the paper. 

To evaluate our implementation, we have conducted two sets of experiments?. 
All experiments were run on Intel E5-2690 V2 CPU at 3 GHz with 128 GB mem- 
ory with a timeout of 10 min. First, to evaluate the performance of local reasoning 
with global guidance against pure local reasoning, we have compared GSPACER 
with the latest SPACER, to which we refer as the baseline. We took the bench- 
marks from CHC-COMP 2018 and 2019 [10]. We compare to SPACER because it 
dominated the competition by solving 8596 of the benchmarks in CHC-COMP 
2019 (2096 more than the runner up) and 6096 of the benchmarks in CHC- 
COMP 2018 (10% more than runner up). Our evaluation shows that GSPACER 
outperforms SPACER both in terms of number of solved instances and, more 
importantly, in overall robustness. 

Second, to examine the performance of local reasoning with global guidance 
compared to solely global reasoning, we have compared GSPACER with an ML- 
based data-driven invariant inference tool LINEARARBITRARY [28]. Compared to 
other similar approaches, LINEARARBITRARY stands out by supporting invari- 
ants with arbitrary Boolean structure over arbitrary linear predicates. It is com- 
pletely automated and does not require user-provided predicates, grammars, or 
any other guidance. For the comparison with LINEARARBITRARY, we have used 
both the CHC-COMP benchmarks, as well as the benchmarks from the artifact 
evaluation of [28]. The machine and timeout remain the same. Our evaluation 
shows that GSPACER is superior in this case as well. 


Comparison with SPACER. Table 1 summarizes the comparison between SPACER 
and GSPACER on CHC-COMP instances. Since both tools can use a variety of 
interpolation strategies during lemma generalization (Line 45 in Algorithm 5), 
we compare three different configurations of each: bw and fw stand for two inter- 
polation strategies, backward and forward, respectively, already implemented in 
SPACER, and sc stands for turning interpolation off and generalizing lemmas 
only by subset clauses computed by inductive generalization. 

Any configuration of GSPACER solves significantly more instances than even 
the best configuration of SPACER. Figure2 provides a more detailed comparison 
between the best configurations of both tools in terms of running time and depth 
of convergence. There is no clear trend in terms of running time on instances 
solved by both tools. This is not surprising—SMT-solving run time is highly non- 
deterministic and any change in strategy has a significant impact on performance 
of SMT queries involved. In terms of depth, it is clear that GSPACER converges 
at the same or lower depth. The depth is significantly lower for instances solved 
only by GSPACER. 

Moreover, the performance of GSPACER is not significantly affected by the 
interpolation strategy used. In fact, the configuration sc in which interpolation is 


3 Detailed experimental results including the effectiveness of each rule, and the exten- 
sions to non-linear CHCs and LRA can be found at https://hgvk94.github.io/ 
gspacer/. 
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disabled performs the best in CHC-COMP 2018, and only slightly worse in CHC- 
COMP 2019! In comparison, disabling interpolation hurts SPACER significantly. 

Figure 3 provides a detailed comparison of GSPACER with and without inter- 
polation. Interpolation makes no difference to the depth of convergence. This 
implies that lemmas that are discovered by interpolation are discovered as effi- 
ciently by the global rules of GSPACER. On the other hand, interpolation signif- 
icantly increases the running time. Interestingly, the time spent in interpolation 
itself is insignificant. However, the lemmas produced by interpolation tend to 
slow down other aspects of the algorithm. Most of the slow down is in increased 
time for inductive generalization and in computation of predecessors. The com- 
parison between the other interpolation-enabled strategy and GSPACER (sc) 
shows a similar trend. 


Table 1. Comparison between SPACER and GSPACER on CHC-COMP. 


SPACER GSPACER 
Bench 


fw bw sc fw bw sc VBS 


safe unsafe safe unsafe safe unsafe||safe unsafe safe unsafe safe unsafe | safe unsafe 
CHC-18 159 66 163 69 123 68 ||]14 67 214 63 214 69 |229 74 
CHC-19 193 84 186 84 125 84 ||2002 84 196 85 200 84 |207 85 


300 


Spacer(bw) time 
Spacer(bw) depth 
S 
S 


0 100 200 300 400 500 600 o 20 40 60 80 100 120 140 
GSpacer(fw) time GSpacer(fw) depth 


(a) running time (b) depth explored 


Fig. 2. Best configurations: GSPACER versus SPACER. 


Comparison with LINEARARBITRARY. In [28], the authors show that LINEAR- 
ARBITRARY, to which we refer as LARB for short, significantly outperforms 
SPACER on a curated subset of benchmarks from SV-COMP [24] competition. 
At first, we attempted to compare LARB against GSPACER on the CHC- 
COMP benchmarks. However, LARB did not perform well on them. Even the 
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Fig. 3. Comparing GSPACER with different interpolation tactics. 


baseline SPACER has outperformed LARB significantly. Therefore, for a more 
meaningful comparison, we have also compared SPACER, LARB and GSPACER 
on the benchmarks from the artifact evaluation of [28]. The results are sum- 
marized in Table2. As expected, LARB outperforms the baseline SPACER on 
the safe benchmarks. On unsafe benchmarks, SPACER is significantly better 
than LARB. In both categories, GSPACER dominates solving more safe bench- 
marks than either SPACER or LARB, while matching performance of SPACER 
on unsafe instances. Furthermore, GSPACER remains orders of magnitude faster 
than LARB on benchmarks that are solved by both. This comparison shows 
that incorporating local reasoning with global guidance not only mitigates its 
shortcomings but also surpasses global data-driven reasoning. 


Table 2. Comparison with LARB. 


Bench SPACER LARB GSPACER VB 


safe unsafe safe unsafe safe unsafe safe unsafe 
PLDI18 216 68 270 65 279 68 284 68 


6 Related Work 


The limitations of local reasoning in SMT-based infinite state model checking 
are well known. Most commonly, they are addressed with either (a) different 
strategies for local generalization in interpolation (e.g., [1,6,19,23]), or (b) shift- 
ing the focus to global invariant inference by learning an invariant of a restricted 
shape (e.g., [9, 14-16, 28]). 


Interpolation Strategies. Albarghouthi and McMillan [1] suggest to minimize the 
number of literals in an interpolant, arguing that simpler (i.e., fewer half-spaces) 
interpolants are more likely to generalize. This helps with myopic generalizations 
(Fig. 1(a)), but not with excessive generalizations (Fig. 1(b)). On the contrary, 
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Blicha et al. [6] decompose interpolants to be numerically simpler (but with more 
literals), which helps with excessive, but not with myopic, generalizations. Decid- 
ing locally between these two techniques or on their combination (i.e., some parts 
of an interpolant might need to be split while others combined) seems impos- 
sible. Schindler and Jovanovic [23] propose local interpolation that bounds the 
number of lemmas generated from a single POB (which helps with Fig. 1(c)), but 
only if inductive generalization is disabled. Finally, [19] suggests using external 
guidance, in a form of predicates or terms, to guide interpolation. In contrast, 
GSPACER uses global guidance, based on the current proof, to direct different 
local generalization strategies. Thus, the guidance is automatically tuned to the 
specific instance at hand rather than to a domain of problems. 


Global Invariant Inference. An alternative to inferring lemmas for the inductive 
invariant by blocking counterexamples is to enumerate the space of potential 
candidate invariants [9,14—16,28]. This does not suffer from the pitfall of local 
reasoning. However, it is only effective when the search space is constrained. 
While these approaches perform well on their target domain, they do not gener- 
alize well to a diverse set of benchmarks, as illustrated by results of CHC-COMP 
and our empirical evaluation in Sect. 5. 


Locality in SMT and IMC. Local reasoning is also a known issue in SMT, and, in 
particular, in DPLL(T) (e.g., [22]). However, we are not aware of global guidance 
techniques for SMT solvers. Interpolation-based Model Checking (IMC) [20,21] 
that uses interpolants from proofs, inherits the problem. Compared to IMC, 
the propagation phase and inductive generalization of IC3 [7], can be seen as 
providing global guidance using lemmas found in other parts of the search-space. 
In contrast, GSPACER magnifies such global guidance by exploiting patterns 
within the lemmas themselves. 


IC3-SM T-based Model Checkers. There are a number of IC3-style SMT-based 
infinite state model checkers, including [11,17,18]. To our knowledge, none 
extend the IC3-SMT framework with a global guidance. A rule similar to Subsume 
is suggested in [26] for the theory of bit-vectors and in [4] for LRA, but in both 
cases without global guidance. In [4], it is implemented via a combination of syn- 
tactic closure with interpolation, whereas we use MBP instead of interpolation. 
Refinement State Mining in [3] uses similar insights to our Subsume rule to refine 
predicate abstraction. 


7 Conclusion and Future Work 


'This paper introduces global guidance to mitigate the limitations of the local rea- 
soning performed by SM'T-based IC3-style model checking algorithms. Global 
guidance is necessary to redirect such algorithms from divergence due to persis- 
tent local reasoning. To this end, we present three general rules that introduce 
new lemmas and POBs by taking a global view of the lemmas learned so far. The 
new rules are not theory-specific, and, as demonstrated by Algorithm 5, can 
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be incorporated to IC3-style solvers without modifying existing architecture. 
We instantiate, and implement, the rules for LIA in GSPACER, which extends 
SPACER. 

Our evaluation shows that global guidance brings significant improvements 
to local reasoning, and surpasses invariant inference based solely on global rea- 
soning. More importantly, global guidance decouples SPACER’s dependency on 
interpolation strategy and performs almost equally well under all three inter- 
polation schemes we consider. As such, using global guidance in the context of 
theories for which no good interpolation procedure exists, with bit-vectors being 
a primary example, arises as a promising direction for future research. 
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Abstract. In software-defined networks (SDN), a controller program is 
in charge of deploying diverse network functionality across a large num- 
ber of switches, but this comes at a great risk: deploying buggy controller 
code could result in network and service disruption and security loop- 
holes. The automatic detection of bugs or, even better, verification of 
their absence is thus most desirable, yet the size of the network and the 
complexity of the controller makes this a challenging undertaking. In this 
paper, we propose MOCS, a highly expressive, optimised SDN model that 
allows capturing subtle real-world bugs, in a reasonable amount of time. 
This is achieved by (1) analysing the model for possible partial order 
reductions, (2) statically pre-computing packet equivalence classes and 
(3) indexing packets and rules that exist in the model. We demonstrate 
its superiority compared to the state of the art in terms of expressivity, 
by providing examples of realistic bugs that a prototype implementation 
of MOCS in UPPAAL caught, and performance/scalability, by running 
examples on various sizes of network topologies, highlighting the impor- 
tance of our abstractions and optimisations. 


1 Introduction 


Software-Defined Networking (SDN) [16] has brought about a paradigm shift in 
designing and operating computer networks. A logically centralised controller 
implements the control logic and *programs' the data plane, which is defined by 
flow tables installed in network switches. SDN enables the rapid development 
of advanced and diverse network functionality; e.g. in designing next-generation 
inter-data centre traffic engineering [10], load balancing [19], firewalls [24], and 
Internet exchange points (IXPs) [15]. SDN has gained noticeable ground in the 
industry, with major vendors integrating OpenFlow [37], the de-facto SDN stan- 
dard maintained by the Open Networking Forum, in their products. Operators 
deploy it at scale [27,38]. SDN presents a unique opportunity for innovation and 
rapid development of complex network services by enabling all players, not just 
vendors, to develop and deploy control and data plane functionality in networks. 
'This comes at a great risk; deploying buggy code at the controller could result 
in problematic flow entries at the data plane and, potentially, service disrup- 
tion [13,18,47,49] and security loopholes [7,26]. Understanding and fixing such 
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bugs is far from trivial, given the distributed and concurrent nature of computer 
networks and the complexity of the control plane [44]. 

With the advent of SDN, a large body of research on verifying network prop- 
erties has emerged [33]. Static network analysis approaches [2,11,30,34,45,51] 
can only verify network properties on a given fixed network configuration but this 
may be changing very quickly (e.g. as in [1]). Another key limitation is the fact 
that they cannot reason about the controller program, which, itself, is respon- 
sible for the changes in the network configuration. Dynamic approaches, such 
as [23,29,31,40,48,50], are able to reason about network properties as changes 
happen (i.e. as flow entries in switches' flow tables are being added and deleted), 
but they cannot reason about the controller program either. As a result, when 
a property violation is detected, there is no straightforward way to fix the bug 
in the controller code, as these systems are oblivious of the running code. Iden- 
tifying bugs in large and complex deployments can be extremely challenging. 

Formal verification methods that include the controller code in the model 
of the network can solve this important problem. Symbolic execution meth- 
ods, such as [5,8,11,12,14,28,46], evaluate programs using symbolic variables 
accumulating path-conditions along the way that then can be solved logically. 
However, they suffer from the path explosion problem caused by loops and func- 
tion calls which means verification does not scale to larger controller programs 
(bug finding still works but is limited). Model checking SDNs is a promising area 
even though only few studies have been undertaken [3,8,28,35,36,43]. Networks 
and controller can be naturally modelled as transition systems. State explosion 
is always a problem but can be mitigated by using abstraction and optimisa- 
tion techniques (i.e. partial order reductions). At the same time, modern model 
checkers [6,9, 20, 21,25] are very efficient. 

NetSMC [28] uses a bespoke symbolic model checking algorithm for checking 
properties given a subset of computation tree logic that allows quantification 
only over all paths. As a result, this approach scales relatively well, but the 
requirement that only one packet can travel through the network at any time 
is very restrictive and ignores race conditions. NICE [8] employs model checking 
but only looks at a limited amount of input packets that are extracted through 
symbolically executing the controller code. As a result, it is a bug-finding tool 
only. The authors in [43] propose a model checking approach that can deal 
with dynamic controller updates and an arbitrary number of packets but require 
manually inserted non-interference lemmas that constrain the set of packets that 
can appear in the network. This significantly limits its applicability in realistic 
network deployments. Kuai [35] overcomes this limitation by introducing model- 
specific partial order reductions (PORs) that result in pruning the state space 
by avoiding redundant explorations. However, it has limitations explained at the 
end of this section. 

In this paper, we take a step further towards the full realisation of model 
checking real-world SDNs by introducing MOCS (MOdel Checking for Soft- 
ware defined networks)!, a highly expressive, optimised SDN model which we 


' A release of MOCS is publicly available at https://tinyurl.com/y95qtv5k. 
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implemented in UPPAAL? [6]. MOCS, compared to the state of the art in model 
checking SDNs, can model network behaviour more realistically and verify larger 
deployments using fewer resources. The main contributions of this paper are: 


Model Generality. The proposed network model is closer to the Open- 
Flow standard than previous models (e.g. [35]) to reflect commonly exhibited 
behaviour between the controller and network switches. More specifically, it 
allows for race conditions between control messages and includes a significant 
number of OpenFlow interactions, including barrier response messages. In our 
experimentation section, we present families of elusive bugs that can be efficiently 
captured by MOCS. 


Model Checking Optimisations. To tackle the state explosion problem we 
propose context-dependent partial order reductions by considering the concrete 
control program and specification in question. We establish the soundness of 
the proposed optimisations. Moreover, we propose state representation optimi- 
sations, namely packet and rule indexing, identification of packet equivalence 
classes and bit packing, to improve performance. We evaluate the benefits from 
all proposed optimisations in Sect. 4. 

Our model has been inspired by Kuai [35]. According to the contributions 
above, however, we consider MOCS to be a considerable improvement. We model 
more OpenFlow messages and interactions, enabling us to check for bugs that [35] 
cannot even express (see discussion in Sect. 4.2). Our context-dependent PORs 
systematically explore possibilities for optimisation. Our optimisation techniques 
still allow MOCS to run at least as efficiently as Kuai, often with even better 
performance. 


2 Software-Defined Network Model 


A key objective of our work is to enable the verification of network-wide proper- 
ties in real-world SDNs. In order to fulfill this ambition, we present an extended 
network model to capture complex interactions between the SDN controller 
and the network. Below we describe the adopted network model, its state and 
transitions. 


2.1 Formal Model Definition 


The formal definition of the proposed SDN model is by means of an action- 
deterministic transition system. We parameterise the model by the underlying 
network topology À and the controller program CP in use, as explained further 
below (Sect. 2.2). 


Definition 1. An SDN model is a 6-tuple M(y,cp) = (S, so, A, >, AP, L), where 
S is the set of all states the SDN may enter, so the initial state, A the set of 


? UPPAAL has been chosen as future plans include extending the model to timed actions 
like e.g. timeouts. Note that the model can be implemented in any model checker. 
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actions which encode the events the network may engage in, ~G Sx Ax S 
the transition relation describing which execution steps the system undergoes as 
it perform actions, AP a set of atomic propositions describing relevant state 
properties, and L : S — 2^" is a labelling function, which relates to any state 
s € S a set L(s) € 2^P of those atomic propositions that are true for s. Such 
an SDN model is composed of several smaller systems, which model network 
components (hosts, switches and the controller) that communicate via queues 
and, combined, give rise to the definition of —. The states of an SDN transition 
system are 3-tuples (n,ô, y), where m represents the state of each host, 6 the 
state of each switch, and y the controller state. The components are explained 
in Sect. 2.2 and the transitions —. in Sect. 2.3. 


Figurel illustrates a high-level view of OpenFlow interactions (left side), 
modelled actions and queues (right side). 
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Fig. 1. A high-level view of OpenFlow interactions using OpenFlow specification ter- 
minology (left half) and the modelled actions (right half). A red solid-line arrow depicts 
an action which, when fired, (1) dequeues an item from the queue the arrow begins 
at, and (2) adds an item in the queue the arrowhead points to (or multiple items if 
the arrow is double-headed). Deleting an item from the target queue is denoted by 
a reverse arrowhead. A forked arrow denotes multiple targeted queues. (Color figure 
online) 


2.2 SDN Model Components 


Throughout we will use the common “dot-notation” (_._) to refer to components 
of composite gadgets (tuples), e.g. queues of switches, or parts of the state. We 
use obvious names for the projections functions like s.6.sw.pq for the packet 
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queue of the switch sw in state s. At times we will also use tı and t» for the first 
and second projection of tuple t. 


Network Topology. A location (n, pt) is a pair of a node (host or switch) 
" and a port pt. We describe the network topology as a bijective function 
A: (Switches u Hosts) x Ports > (Switches u Hosts) x Ports consisting of a set 
of directed edges ((n, pt), (n’, pt’)), where pt’ is the input port of the switch or 
host n' that is connected to port pt at host or switch n. Hosts, Switches and 
Ports are the (finite) sets of all hosts, switches and ports in the network, respec- 
tively. The topology function is used when a packet needs to be forwarded in 
the network. The location of the next hop node is decided when a send, match 
or fwd action (all defined further below) is fired. Every SDN model is w.r.t. a 
fixed topology A that does not change. 


Packets. Packets are modelled as finite bit vectors and transferred in the net- 
work by being stored to the queues of the various network components. A 
packet € Packets (the set of all packets that can appear in the network) contains 
bits describing the proof-relevant header information and its location loc. 


Hosts. Each host € Hosts, has a packet queue (rcvq) and a finite set of ports 
which are connected to ports of other switches. A host can send a packet to one 
or more switches it is connected to (send action in Fig.1) or receive a packet 
from its own rcvg (recv action in Fig. 1). Sending occurs repeatedly in a non- 
deterministic fashion which we model implicitly via the (0,00) abstraction at 
switches! packet queues, as discussed further below. 


Switches. Each switch € Switches, has a flow table (ft), a packet queue (pq), 
a control queue (cq), a forwarding queue (fq) and one or more ports, through 
which it is connected to other switches and/or hosts. A flow table ft € Rules is a 
set of forwarding rules (with Rules being the set of all rules). Each one consists 
of a tuple (priority, pattern, ports), where priority € N determines the priority 
of the rule over others, pattern is a proposition over the proof-relevant header 
of a packet, and ports is a subset of the switch's ports. Switches match packets 
in their packet queues against rules (i.e. their respective pattern) in their flow 
table (match action in Fig. 1) and forward packets to a connected device (or final 
destination), accordingly. Packets that cannot be matched to any rule are sent to 
the controller's request queue (rg) (nomatch action in Fig. 1); in OpenFlow, this 
is done by sending a PacketIn message. The forwarding queue fq stores packets 
forwarded by the controller in PacketOut messages. The control queue stores 
messages sent by the controller in FlowMod and BarrierReq messages. FlowMod 
messages contain instructions to add or delete rules from the flow table (that 
trigger add and del actions in Fig. 1). BarrierReq messages contain barriers to 
synchronise the addition and removal of rules. MOCS conforms to the OpenFlow 
specifications and always execute instructions in an interleaved fashion obeying 
the ordering constraints imposed by barriers. 


OpenFlow Controller. T'he controller is modelled as a finite state automaton 
embedded into the overall transition system. A controller program CP, as used 
to parametrise an SDN model, consists of (C'S, pktIn, barrierIn). It uses its own 
local state cs € CS, where C'S is the finite set of control program states. Incoming 
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PacketIn and BarrierRes messages from the SDN model are stored in separate 
queues (rq and brq, respectively) and trigger ctrl or bsync actions (see Fig. 1) 
which are then processed by the controller program in its current state. The 
controller’s corresponding handler, pktIn for PacketIn messages and barrierIn 
for BarrierRes messages, responds by potentially changing its local state and 
sending messages to a subset of Switches, as follows. A number of PacketOut 
messages (pairs of pkt, ports) can be sent to a subset of Switches. Such a message 
is stored in a switch's forward queue and instructs it to forward packet pkt 
along the ports ports. The controller may also send any number of FlowMod 
and BarrierReq messages to the control queue of any subset of Switches. A 
FlowMod message may contain an add or delete rule modification instruction. 
'These are executed in an arbitrary order by switches, and barriers are used to 
synchronise their execution. Barriers are sent by the controller in BarrierReq 
messages. OpenFlow requires that a response message (BarrierRes) is sent to 
the controller by a switch when a barrier is consumed from its control queue 
so that the controller can synchronise subsequent actions. Our model includes a 
brepl action that models the sending of a BarrierRes message from a switch to 
the controller's barrier reply queue (brq), and a bsync action that enables the 
controller program to react to barrier responses. 


Queues. All queues in the network are modelled as finite state. Packet queues pq 
for switches are modelled as multisets, and we adopt (0, oo) abstraction [41]; i.e. 
a packet is assumed to appear either zero or an arbitrary (unbounded) amount 
of times in the respective multiset. This means that once a packet has arrived 
at a switch or host, (infinitely) many other packets of the same kind repeatedly 
arrive at this switch or host. Switches! forwarding queues fq are, by contrast, 
modelled as sets, therefore if multiple identical packets are sent by the controller 
to a switch, only one will be stored in the queue and eventually forwarded by 
the switch. The controller’s request rq and barrier reply queues brq are modelled 
as sets as well. Hosts’ receive queues rcvq are also modelled as sets. Controller 
queues cq at switches are modelled as a finite sequence of sets of control messages 
(representing add and remove rule instructions), interleaved by any number of 
barriers. As the number of barriers that can appear at any execution is finite, 
this sequence is finite. 


2.3 Guarded Transitions 


Here we provide a detailed breakdown of the transition relation s Suid s' for 
each action a(d) € A(s), where A(s) the set of all enabled actions in s in the 
proposed model (see Fig.1). Transitions are labelled by action names a with 
arguments d. The transitions are only enabled in state s if s satisfies certain 
conditions called guards that can refer to the arguments d. In guards, we make 
use of predicate bestmatch(sw, r, pkt) that expresses that r is the highest priority 
rule in sw.ft that matches pkt’s header. Below we list all possible actions with 
their respective guards. 


send(h, pt, pkt). Guard: true. This transition models packets arriving in the 
network in a non-deterministic fashion. When it is executed, pkt is added to 
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the packet queue of the network switch connected to the port pt of host h (or, 
formally, to A(h, pt)1.pq, where A is the topology function described above). As 
described in Sect. 3.2, only relevant representatives of packets are actually sent 
by end-hosts. This transition is unguarded, therefore it is always enabled. 


recv(h, pkt). Guard: pkt € h.rcuq. This transition models hosts receiving (and 
removing) packets from the network and is enabled if pkt is in h’s receive queue. 


match(sw, pkt, r). Guard: pkt € sw.pq^r € sw.ft ^ bestmatch(sw, r, pkt). This 
transition models matching and forwarding packet pkt to zero or more next hop 
nodes (hosts and switches), as a result of highest priority matching of rule r with 
pkt. The packet is then copied to the packet queues of the connected hosts and/or 
switches, by applying the topology function to the port numbers in the matched 
rule; i.e. A(sw, pt)1.pq, Vpt € r.ports. Dropping packets is modelled by having a 
special ‘drop’ port that can be included in rules. The location of the forwarded 
packet(s) is updated with the respective destination (switch/host, port) pair; i.e. 
A(sw, pt). Due to the (0,00) abstraction, the packet is not removed from sw.pq. 


nomatch(sw, pkt). Guard: pkt € sw.pq ^ fr € sw.ft . bestmatch(sw, r, pkt). 
'This transition models forwarding a packet to the OpenFlow controller when a 
switch does not have a rule in its forwarding table that can be matched against 
the packet header. In this case, pkt is added to rq for processing. pkt is not 
removed from sw.pq due to the supported (0, o0) abstraction. 


ctrl(sw, pkt, cs). Guard: pkt € controller.rq. This transition models the exe- 
cution of the packet handler by the controller when packet pkt that was pre- 
viously sent by sw is available in rg. The controller's packet handler function 
pktIn(sw, pkt, cs) is executed which, in turn (i) reads the current controller state 
cs and changes it according to the controller program, (ii) adds a number of rules, 
interleaved with any number of barriers, into the cq of zero or more switches, 
and (iii) adds zero or more forwarding messages, each one including a packet 
along with a set of ports, to the fq of zero or more switches. 


f wd(sw, pkt, ports). Guard: (pkt, ports) € sw.fq. This transition models for- 
warding packet pkt that was previously sent by the controller to sw's forwarding 
queue sw.fq. In this case, pkt is removed from sw.fq (which is modelled as a 
set), and added to the pq of a number of network nodes (switches and/or hosts), 
as defined by the topology function A(sw,pt)1.pq, Vpt € ports. The location of 
the forwarded packet(s) is updated with the respective destination (switch/host, 
port) pair; i.e. A(n, pt). 


FM(sw,r), where FM € (add,del). Guard: (FM,r) € head(sw.cq). These 
transitions model the addition and deletion, respectively, of a rule in the flow 
table of switch sw. They are enabled when one or more add and del control 
messages are in the set at the head of the switch's control queue. In this case, 
r is added to — or deleted from, respectively — sw.ft and the control message 
is deleted from the set at the head of cq. If the set at the head of cq becomes 
empty it is removed. If then the next item in cq is a barrier, a brepl transition 
becomes enabled (see below). 
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brepl(sw, xid). Guard: b(xid) = head(sw.cq). This transition models a switch 
sending a barrier response message, upon consuming a barrier from the head of 
its control queue; i.e. if b(xid) is the head of sw.cq, where xid € N is an identifier 
for the barrier set by the controller, b(xid) is removed and the barrier reply 
message br(sw, xid) is added to the controller's brq. 


bsync(sw, xid, cs). Guard: br(sw, xid) € controller.brq. This transition models 
the execution of the barrier response handler by the controller when a barrier 
response sent by switch sw is available in brq. In this case, br(sw, xid) is removed 
from the brq, and the controller's barrier handler barrierIn(sw, xid, cs) is exe- 
cuted which, in turn (i) reads the current controller state cs and changes it 
according to the controller program, (ii) adds a number of rules, interleaved 
with any number of barriers, into the cq of zero or more switches, and (iii) adds 
zero or more forwarding messages, each one including a packet along with a set 
of ports, to the fq of zero or more switches. 


An Example Run. In Fig.2, we illustrate a sequence of MOCS transitions 
through a simple packet forwarding example. The run starts with a send tran- 
sition; packet p is copied to the packet queue of the switch in black. Initially, 
switches’ flow tables are empty, therefore p is copied to the controller’s request 
queue (nomatch transition); note that p remains in the packet queue of the 
switch in black due to the (0,00) abstraction. The controller's packet handler is 
then called (ctrl transition) and, as a result, (1) p is copied to the forwarding 
queue of the switch in black, (2) rule rı is copied to the control queue of the 
switch in black, and (3) rule rz is copied to the control queue of the switch in 
white. Then, the switch in black forwards p to the packet queue of the switch 
in white (fwd transition). The switch in white installs rz in its flow table (add 
transition) and then matches p with the newly installed rule and forwards it to 
the receive queue of the host in white (match transition), which removes it from 
the network (recv transition). 


2.4 Specification Language 


In order to specify properties of packet flow in the network, we use LTL formulas 
without “next-step” operator ©°, where atomic formulae denoting properties of 
states of the transition system, i.e. SDN network. In the case of safety properties, 
ie. an invariant w.r.t. states, the LTL {o} formula is of the form Oy, i.e. has 
only an outermost 0 temporal connective. 

Let P denote unary predicates on packets which encode a property of a 
packet based on its fields. An atomic state condition (proposition) in AP is 
either of the following: (i) existence of a packet pkt located in a packet queue 
(pq) of a switch or in a receive queue (rcvq) of a host that satisfies P (we 
denote this by 3pkten.pq. P(pkt) with n € Switches, and dpkteh.rcvq . P(pkt) 


3 'This is the largest set of formulae supporting the partial order reductions used in 
Sect.3, as stutter equivalence does not preserve the truth value of formulae with 


the Q. 
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recv(E2 , p) 
atch(&), p, r2) 


1 
p 


p € rcvq p € rcvq 


Fig. 2. Forwarding p from ® to CI. Non greyed-out icons are the ones whose state 
changes in the current transition. 


with h € Hosts)^; (ii) the controller is in a specific controller state q € CS, 
denoted by a unary predicate symbol Q(q) which holds in system state s € S 
if q = s.y.cs. The specification logic comprises first-order formula with equality 
on the finite domains of switches, hosts, rule priorities, and ports which are 
state-independent (and decidable). 

For example, 3pktesw.pq . P(pkt) represents the fact that the packet predi- 
cate P(.) is true for at least one packet pkt in the pq of switch sw. For every 
atomic packet proposition P(pkt), also its negation 2 P(pkt) is an atomic propo- 
sition for the reason of simplifying syntactic checks of formulae in Tablel in 
the next section. Note that universal quantification over packets in a queue 
is a derived notion. For instance, Vpkten.pq.P(pkt) can be expressed as 
Apken.pq .—P(pkt). Universal and existential quantification over switches or 
hosts can be expressed by finite iterations of ^ and v, respectively. 

In order to be able to express that a condition holds when a certain event 
happened, we add to our propositions instances of propositional dynamic logic 
[17,42]. Given an action a(-) € A and a proposition P that may refer to any 
variables in z, [a(Z)|P is also a proposition and [o(z)|P is true if, and only if, 
after firing transition a(@) (to get to the current state), P holds with the variables 
in 7 bound to the corresponding values in the actual arguments d. With the help 
of those basic modalities one can then also specify that more complex events 
occurred. For instance, dropping of a packet due to a match or fwd action can 
be expressed by [match(sw, pkt, r)|(r.fwd. port = drop) ^ [fwd(sw, pkt, pt)|(pt = 
drop). Such predicates derived from modalities are used in [32] (extended version 
of this paper, with proofs and controller programs), Appendix B-CP5. 


^ Note that these are atomic propositions despite the use of the existential quantifier 
notation. 
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The meaning of temporal LTL operators is standard depending on the trace 
of a transition sequence sg => s, = .... The trace L(so)L(s1) ... L(s;)... is 
defined as usual. For instance, trace L(so)L(s1)L(ss) ... satisfies invariant Oy if 
each L(s;) implies o. 


3 Model Checking 


In order to verify desired properties of an SDN, we use its model as described in 
Definition 1 and apply model checking. In the following we propose optimisations 
that significantly improve the performance of model checking. 


3.1 Contextual Partial-Order Reduction 


Partial order reduction (POR) [39] reduces the number of interleavings (traces) 
one has to check. Here is a reminder of the main result (see [4]) where we use a 
stronger condition than the regular (C4) to deal with cycles: 


Theorem 1 (Correctness of POR). Given a finite transition system M = 
(S, A, —, so, AP, L) that is action-deterministic and without terminal states, let 
A(s) denote the set of actions in A enabled in state s € S. Let ample(s) & A(s) 
be a set of actions for a state s € S that satisfies the following conditions: 


C1 (Non)emptiness condition: Ø #4 ample(s) € A(s). 


C2 Dependency condition: Let s £T Bini E By £ t be a run in M. If Be 


AN ample(s) depends on ample(s), then o; € ample(s) for some 0 < i € n, 
which means that in every path fragment of M, B cannot appear before some 
transition from ample(s) is executed. 

C3 Invisibility condition: If ample(s) 4 A(s) (i.e., state s is not fully expanded), 
then every a € ample(s) is invisible. 

C4 Every cycle in M°”?© contains a state s such that ample(s) = A(s). 


where MP! = (Sa, A, —>, so, AP, La) is the new, optimised, model defined as 
follows: let Sa © S be the set of states reachable from the initial state so under 
—», let Lals) = L(s) for all s € Sa, and define — € Sa x A x Sq inductively by 
the rule 
s S sg 
ME if a € ample(s) 
S —»585 
If ample(s) satisfies conditions (C1)-(C4) as outlined above, then for each path 
in M there exists a stutter-trace equivalent path in MPE, and vice-versa, 
denoted M & Ample. 
The intuitive reason for this theorem to hold is the following: Assume an action 
sequence Q;...A;+n that reaches the state s, and 8 is independent of {a;, ...ai+n}- 
Then, one can permute 8 with o;,, through a; successively n times. One can 
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therefore construct the sequence 0o;...o;,,4 that also reaches the state s. If this 
shift of G does not affect the labelling of the states with atomic propositions 
(B is called invisible in this case), then it is not detectable by the property to 
be shown and the permuted and the original sequence are equivalent w.r.t. the 
property and thus don't have to be checked both. One must, however, ensure, 
that in case of loops (infinite execution traces) the ample sets do not preclude 
some actions to be fired altogether, which is why one needs (C4). 

The more actions that are both stutter and provably independent (also 
referred to as safe actions [22]) there are, the smaller the transition system, 
and the more efficient the model checking. One of our contributions is that we 
attempt to identify as many safe actions as possible to make PORs more widely 
applicable to our model. 

The PORs in [35] consider only dependency and invisibility of recv and bar- 
rier actions, whereas we explore systematically all possibilities for applications 
of Theorem 1 to reduce the search space. When identifying safe actions, we con- 
sider (1) the actual controller program CP, (2) the topology A and (3) the state 
formula y to be shown invariant, which we call the context CTX of actions. It 
turns out that two actions may be dependent in a given context of abstrac- 
tion while independent in another context, and similarly for invisibility, and we 
exploit this fact. The argument of the action thus becomes relevant as well. 


Definition 2 (Safe Actions). Given a contest CTX = (cP,A,y), and SDN 
model Mac) = (S, A, —, so, AP, L), an action a(-) € A(s) is called ‘safe’ if 
it is independent of any other action in A and invisible for y. We write safe 
actions á(-). 

Definition 3 (Order-sensitive Controller Program). A controller pro- 
gram CP is order-sensitive if there exists a state s € S and two actions a, (3 
in {ctrl(-), bsync(-)) such that o, 8 € A(s) and s È sı La s2 and s — sg Š s4 


with S2 F S4. 


Definition 4. Let y be a state formula. An action ae A is called “p-invariant’ 
if s =| v if als) Fy for all s e S with a € A(s). 


Lemma 1. For transition system M(x) = (S, A, >, so, AP, L) and a formula 
pe LTI Ato, o € A is safe iff Au Safe;(o), where Safe;, given in Table 1, 
are per-row. 


Proof. See [32] Appendix A. 


Theorem 2 (POR instance for SDN). Let (CP, A, p) be a context such that 
Mop) = (S, A, =, so, AP, L) is an SDN network model from Definition 1; and 
let safe actions be as in Definition 2. Further, let ample(s) be defined by: 


[ae A(s) | a safe } if {ae A(s) | a safe} AD 
emuptels) = s otherwise 
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Table 1. Safeness predicates 


Action Independence Invisibility 
Safe, (a) Safes (a) Safe4(o) 
a = ctrl(sw, pk, cs) CP is not order-sensitive | if Q(q) occurs in y, where q € CS, 


then a is y-invariant 


a = bsync(sw, xid, cs) | CP is not order-sensitive | if Q(q) occurs in y, where q e CS, 
then a is y-invariant 


if Jpkeb.q. P(pk) occurs in , for any 
be (sw) o (A(sw,p)1 | p € ports} and 
q € (pq, recuq), then a is q-invariant 


a = fwd(sw, pk, ports) | T 


a = brepl(sw, vid) T T 


if Jpkeh.rcvq . P(pk) occurs in p, 
then a is -invariant 


a = recv(h, pk) 


a M ample 5 


Then, ample satisfies the criteria of Theorem 1 and thus Mcp) (cp) 


Proof. 


C1 The (non)emptiness condition is trivial since by definition of ample(s) it 
follows that ample(s) = Ø iff A(s) = Ø. 

C2 By assumption 3 € A\ample(s) depends on ample(s). But with our defini- 
tion of ample(s) this is impossible as all actions in ample(s) are safe and by 
definition independent of all other actions. 

C3 'The validity of the invisibility condition is by definition of ample and safe 
actions. 

C4 We now show that every cycle in Many 
a state s such that ample(s) = A(s). By definition of ample(s) in Theorem 2 

(rc ees consisting of safe 

actions only. We show this by contradiction, assuming such a cycle of only 

safe actions exists. There are five safe action types to consider: ctrl, fwd, brepl, 
bsync and recv. Distinguish two cases. 


contains a fully expanded state s, i.e. 


it is equivalent to show that there is no cycle in M 


Case 1. A sequence of safe actions of same type. Let us consider the different 
safe actions: 


e Let p an execution of M a which consists of only one type of ctrl-actions: 


ctrl(pkt1,cs1) ctrl(pkt2,cs2) ctrl(pkti—1,¢c8i-1) 
p= 81° —» S92 8G] —————————— —» .84 


Suppose p is a cycle. According to the ctrl semantics, for each transition 


trl(pkt, 
Mui qu LUN s', where s = (7,6,*), s' = (',0', y), it holds that 7/.rq = 


y.rq\{pkt} as we use sets to represent rq buffers. Hence, for the execution 
p it holds y%.rqg = y1.rq\{pkt1, pkte, ...pkt; 1) which implies that s; Æ sj. 
Contradiction. 


5 Stutter equivalence here implicitly is defined w.r.t. the atomic propositions appearing 
in y, but this suffices as we are just interested in the validity of q. 
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e Let p an execution which consists of only one type of fwd-actions: similar 
argument as above since fq-s are represented by sets and thus forward mes- 
sages are removed from /q. 

e Let p an execution which consists of only one type of brepl-actions: similar 
argument as above since control messages are removed from cq. 

e Let p an execution which consists of only one type of bsync-actions: similar 
argument as above, as barrier reply messages are removed from brq-s that are 
represented by sets. 

e Let p an execution which consists of only one type of recv-actions: similar 
argument as above, as packets are removed from rcvq buffers that are repre- 
sented by sets. 


Case 2. A sequence of different safe actions. Suppose there exists a cycle with 
mixed safe actions starting in s; and ending in s;. Distinguish the following 
cases. 


i) There exists at least a ctrl and/or a bsync action in the cycle. According 
to the effects of safe transitions, the ctrl action will change to a state with 
smaller rg and the bsync will always switch to a state with smaller brq. It 
is important here that ctrl does not interfere with bsync regarding rq, brq, 
and no safe action of other type than ctrl and bsync accesses rq or brq. This 
implies that sı Æ si. Contradiction. 

ii) Neither ctrl, nor bsync actions in the cycle. 

a) There is a fwd and/or brepl in the cycle: fwd will always switch to a state 
with smaller fq and brepl will always switch to a state with smaller cq 
(brepl and recv do not interfere with fwd). This implies that sı Æ si. 
Contradiction. 

b) There is neither fwd nor brepl in the cycle. This means that only recv is 
in the cycle which is already covered by the first case. 


Due to the definition of the transition system via ample sets, each 
safe action is immediately executed after its enabling one. "Therefore, 
one can merge every transition of a safe action with its precursory 
enabling one. Intuitively, the semantics of the merged action is defined 
as the successive execution of its constituent actions. This process can be 


repeated if there is a chain of safe actions; for instance, in the case of 
nomatch(sw,pkt) ctrl(sw,pkt,cs) fwd(sw,pkt,ports) 
sl » s" c 


E s" where each transition 
enables the next and the last two are assumed to be safe. These transitions can 
be merged into one, yielding a stutter equivalent trace as the intermediate states 
are invisible (w.r.t. the context and thus the property to be shown) by definition 
of safe actions. 


3.2 State Representation 


Efficient state representation is crucial for minimising MOCS's memory footprint 
and enabling it to scale up to relatively large network setups. 
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Packet and Rule Indexing. In MOCS, only a single instance of each packet 
and rule that can appear in the modelled network is kept in memory. An index 
is then used to associate queues and flow tables with packets and rules, with a 
single bit indicating their presence (or absence). This data structure is illustrated 
in Fig. 3. For a data packet, a value of 1 in the pq section of the entry indicates 
that infinite copies of it are stored in the packet queue of the respective switch. 
A value of 1 in the fq section indicates that a single copy of the packet is stored 
in the forward queue of the respective switch. A value of 1 in the rq section 
indicates that a copy of the packet sent by the respective switch (when a nomatch 
transition is fired) is stored in the controller’s request queue. For a rule, a value 
of 1 in the ft section indicates that the rule is installed in the respective switch’s 
flow table. A value of 1 in the cq section indicates that the rule is part of a 
FlowMod message in the respective switch’s control queue. 


state 
A 
match fields state action match fields 
A 
fq rq pq (location) dstlP scrlIP ft cq prio out pt in pt dstlP scriP 
00 OJO 1 171 Of1 0 1/0 1 1|O0 1|1 1 1 O70 O71 1|1 1/|]0 1/1 0/0 1 
out pt jsSw;| out pt Swi] swa swi[ in pt sw; inpt jsw] — | L L SW2 j SW: | SW; | SW1 f L l f L 
15 12 10 7 4 2 0 12 10 8 6 4 2 0 


Fig. 3. Packet (left) and rule (right) indices 


'The proposed optimisation enables scaling up the network topology by min- 
imising the required memory footprint. For every switch, MOCS only requires a 
few bits in each packet and rule entry in the index. 


Discovering Equivalence Classes of Packets. Model checking with all pos- 
sible packets, including all specified fields in the OpenFlow standard, would 
entail a huge state space that would render any approach unusable. Here, we 
propose the discovery of equivalence classes of packets that are then used for 
model checking. We first remove all fields that are not referenced in a statement 
or rule creation or deletion in the controller program. Then, we identify packet 
classes that would result in the same controller behaviour. Currently, as with the 
rest of literature, we focus on simple controller programs where such equivalence 
classes can be easily identified by analysing static constraints and rule manip- 
ulation in the controller program. We then generate one representative packet 
from each class and assign it to all network switches that are directly connected 
to end-hosts; i.e. modelling clients that can send an arbitrarily large number of 
packets in a non-deterministic fashion. We use the minimum possible number of 
bits to represent the identified equivalence classes. For example, if the controller 
program exerts different behaviour if the destination TCP port of a packet is 23 
(i.e. destined to an SSH server) or not, we only use a 1-bit field to model this 
behaviour. 
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Bit Packing. We reduce the size of each recorded state by employing bit packing 
using the int type supported by UPPAAL, and bit-level operations for the entries 
in the packet and rule indices as well as for the packets and rules themselves. 


4 Experimental Evaluation 


In this section, we experimentally evaluate MOCS by comparing it with the 
state of the art, in terms of performance (verification throughput and mem- 
ory footprint) and model expressivity. We have implemented MOCS in UPPAAL 
[6] as a network of parallel automata for the controller and network switches, 
which communicate asynchronously by writing/reading packets to/from queues 
that are part of the model discussed in Sect.2. As discussed in Sect.3, this is 
implemented by directly manipulating the packet and rule indices. 

Throughout this section we will be using three examples of network con- 
trollers: (1) A stateless firewall ([32] Appendix B-CP1) requires the controller to 
install rules to network switches that enable them to decide whether to forward 
a packet towards its destination or not; this is done in a stateless fashion, i.e. 
without having to consider any previously seen packets. For example, a controller 
could configure switches to block all packets whose destination TCP port is 22 
(i.e. destined to an SSH server). (2) A stateful firewall ([32] Appendix B-CP2) 
is similar to the stateless one but decisions can take into account previously 
seen packets. A classic example of this is to allow bi-directional communication 
between two end-hosts, when one host opens a TCP connection to the other. 
Then, traffic flowing from the other host back to the connection initiator should 
be allowed to go through the switches on the reverse path. (3) A MAC learning 
application ([32] Appendix B-CP3) enables the controller and switches to learn 
how to forward packets to their destinations (identified with respective MAC 
addresses). A switch sends a PacketIn message to the controller when it receives 
a packet that it does not know how to forward. By looking at this packet, the 
controller learns a mapping of a source switch (or host) to a port of the request- 
ing switch. It then installs a rule (by sending a FlowMod message) that will allow 
that switch to forward packets back to the source switch (or host), and asks the 
requesting switch (by sending a PacketOut message) to flood the packet to all 
its ports except the one it received the packet from. This way, the controller 
eventually learns all mappings, and network switches receive rules that enable 
them to forward traffic to their neighbours for all destinations in the network. 


4.1 Performance Comparison 


We measure MOCS’s performance, and also compare it against Kuai [35]° using 
the examples described above, and we investigate the behaviour of MOCS as 
we scale up the network (switches and clients/servers). We report three metrics: 


$ Note that parts of Kuai's source code are not publicly available, therefore we imple- 
mented it's model in UPPAAL. 
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Fig. 4. Performance comparison — verification throughput 


(1) verification throughput in visited states per second, (2) number of visited 
states, and (3) required memory. We have run all verification experiments on an 
18-Core iMac pro, 2.3GHz Intel Xeon W with 128 GB DDR4 memory. 


Verification Throughput. We measure the verification throughput when run- 
ning a single experiment at a time on one CPU core and report the average and 
standard deviation for the first 30 min of each run. In order to assess how MOCS’s 
different optimisations affect its performance, we report results for the following 
system variants: (1) MOCS, (2) MOCS without POR, (3) MOCS without any 
optimisations (neither POR, state representation), and (4) Kuai. Figure 4 shows 
the measured throughput (with error bars denoting standard deviation). 

For the MAC learning and stateless firewall applications, we observe that 
MOCS performs significantly better than Kuai for all different network setups 
and sizes’, achieving at least double the throughput Kuai does. The throughput 
performance is much better for the stateful firewall, too. This is despite the fact 
that, for this application, Kuai employs the unrealistic optimisation where the 
barrier transition forces the immediate update of the forwarding state. In other 
words, MOCS is able to explore significantly more states and identify bugs that 
Kuai cannot (see Sect. 4.2). 

The computational overhead induced by our proposed PORs is minimal. This 
overhead occurs when PORs require dynamic checks through the safety pred- 
icates described in Table1. This is shown in Fig. 4a, where, in order to decide 
about the (in)visibility of fwd(sw,pk,pt) actions, a lookup is performed in the 
history-array of packet pk, checking whether the bit which corresponds to switch 
sw’, which is connected with port pt of sw, is set. On the other hand, if a POR 
does not require any dynamic checks, no penalty is induced, as shown in Figs. 4b 


T S x H in Figs. 4 to 6 indicates the number of switches S and hosts H. 
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Fig. 5. Performance comparison — visited states (logarithmic scale) 
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Fig. 6. Performance comparison — memory footprint (logarithmic scale) 


and 4c, where the throughput when the PORs are disabled is almost identical to 
the case where PORs are enabled. This is because it has been statically estab- 
lished at a pre-analysis stage that all actions of a particular type are always safe 
for any argument/state. It is important to note that even when computational 
overhead is induced, PORs enable MOCS to scale up to larger networks because 
the number of visited states can be significantly reduced, as discussed below. 

In order to assess the contribution of the state representation optimisation in 
MOCS's performance, we measure the throughput when both PORs and state 
representation optimisations are disabled. It is clear that they contribute signif- 
icantly to the overall throughput; without these the measured throughput was 
at least less than half the throughput when they were enabled. 


Number of Visited States and Required Memory. Minimising the num- 
ber of visited states and required memory is crucial for scaling up verification to 
larger networks. The proposed partial order reductions (Sect.3.1) and identifi- 
cation of packet equivalent classes aim at the former, while packet/rule indexing 
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and bit packing aim at the latter (§3.2). In Fig.5, we present the results for the 
various setups and network deployments discussed above. We stopped scaling up 
the network deployment for each setup when the verification process required 
more than 24h or started swapping memory to disk. For these cases we killed 
the process and report a topped-up bar in Figs. 5 and 6. 

For the MAC learning application, MOCS can scale up to larger network 
deployments compared to Kuai, which could not verify networks consisting of 
more than 2 hosts and 6 switches. For that network deployment, Kuai visited 
~7 m states, whereas MOCS visited only ~193 k states. At the same time, Kuai 
required around 48 GBs of memory (7061 bytes/state) whereas MOCS needed 
^43 MBs (228 bytes/state). Without the partial order reductions, MOCS can 
only verify tiny networks. The contribution of the proposed state representation 
optimisations is also crucial; in our experiments (results not shown due to lack of 
space), for the 6 x 2 network setups (the largest we could do without these opti- 
misations), we observed a reduction in state space (due to the identification of 
packet equivalence classes) and memory footprint (due to packet/rule indexing 
and bit packing) from ~7 m to ~200k states and from ~6 KB per state to ~230 B 
per state. For the stateless and stateful firewall applications, resp., MOCS per- 
forms equally well to Kuai with respect to scaling up. 


4.2 Model Expressivity 


The proposed model is significantly more expressive compared to Kuai as it 
allows for more asynchronous concurrency. To begin with, in MOCS, controller 
messages sent before a barrier request message can be interleaved with all other 
enabled actions, other than the control messages sent after the barrier. By con- 
trast, Kuai always flushes all control messages until the last barrier in one go, 
masking a large number of interleavings and, potentially, buggy behaviour. Next, 
in MOCS nomatch, ctrl and fwd can be interleaved with other actions. In Kuai, 
it is enforced a mutual exclusion concurrency control policy through the wait- 
semaphore: whenever a nomatch occurs the mutex is locked and it is unlocked by 
the fwd action of the thread nomatch-ctrl-fwd which refers to the same packet; 
all other threads are forced to wait. Moreover, MOCS does not impose any limit 
on the size of the rq queue, in contrast to Kuai where only one packet can exist 
in it. In addition, Kuai does not support notifications from the data plane to 
the controller for completed operations as it does not support reply messages 
and as a result any bug related to the fact that the controller is not synced to 
data-plane state changes is hidden.® Also, our specification language for states is 
more expressive than Kuai's, as we can use any property in LTL without “next”, 
whereas Kuai only uses invariants with a single outermost 

'The MOCS extensions, however, are conservative with — to Kuai, that 
is we have the following theorem (without proof, which is straightforward): 


8 There are further small extensions; for instance, in MOCS the controller can send 
multiple PacketOut messages (as OpenFlow prescribes). 
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Theorem 3 (MOCS Conservativity). Let Mao) = (S, A, =, so, AP, L) 
and MES as = (Sk,Ag,=x, so, AP, L) the original SDN models of MOCS 
and Kuai, respectively, using the same topology and controller. Furthermore, 
let Traces(.M(x, c») and Traces(M(\ c.) denote the set of all initial traces in 
these models, respectively. Then, Traces(M( c) € Traces(M(),cr))- 


For each of the extensions mentioned above, we briefly describe an example 
(controller program and safety property) that expresses a bug that is impossible 
to occur in Kuai. 


Control Message Reordering Bug. Let us consider a stateless firewall in 
Fig. 7a (controller is not shown), which is supposed to block incoming SSH pack- 
ets from reaching the server (see [32] Appendix B-CP1). Formally, the safety 
property to be checked here is O(Vpkte S.rcvq . ^pkt.ssu). Initially, flow tables 
are empty. Switch A sends a PacketIn message to the controller when it receives 
the first packet from the client (as a result of a nomatch transition). The con- 
troller, in response to this request (and as a result of a ctrl transition), sends the 
following FlowMod messages to switch A; rule ri has the highest priority and 
drops all SSH packets, rule r2 sends all packets from port 1 to port 2, and rule r3 
sends all packets from port 2 to port 1. If the packet that triggered the transition 
above is an SSH one, the controller drops it, otherwise, it instructs (through a 
PacketOut message) A to forward the packet to S. A bug-free controller should 
ensure that r1 is installed before any other rule, therefore it must send a barrier 
request after the FlowMod message that contains r1. If, by mistake, the Flow- 
Mod message for r2 is sent before the barrier request, A may install r2 before r1, 
which will result in violating the given property. MOCS is able to capture this 
buggy behaviour as its semantics allows control messages prior to the barrier to 
be processed in a interleaved manner. 


(a) (b 


Fig. 7. Two networks with (a) two switches, and (b) n stateful firewall replicas 


Wrong Nesting Level Bug. Consider a correct controller program that 
enforces that server S (Fig. 7a) is not accessible through SSH. Formally, the safety 
property to be checked here is Q(Vpkte S.rcuq.—pkt.ssH). For each incoming 
PacketIn message from switch A, it checks if the enclosed packet is an SSH one 
and destined to S. If not, it sends a PacketOut message instructing A to forward 
the packet to S. It also sends a FlowMod message to A with a rule that allows 
packets of the same protocol (not SSH) to reach S. In the opposite case (SSH), it 
checks (a Boolean flag) whether it had previously sent drop rules for SSH packets 
to the switches. If not, it sets flag to true, sends a FlowMod message with a rule 
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that drops SSH packets to A and drops the packet. Note that this inner block 
does not have an else statement. 

A fairly common error is to write a statement at the wrong nesting level ([32] 
Appendix B-CP4). Such a mistake can be built into the above program by nesting 
the outer else branch in the inner if block, such that it is executed any time 
an SSH-packet is encountered but the SSH drop-rule has already been installed 
(i.e. flag f is true). Now, the ssH drop rule, once installed in switch A, disables 
immediately a potential nomatch(A, p) with p.ssH = true that would have sent 
packet p to the controller, but if it has not yet been installed, a second incoming 
SSH packet would lead to the execution of the else statement of the inner branch. 
This would violate the property defined above, as p will be forwarded to S?. 

MOCS can uncover this bug because of the correct modelling of the controller 
request queue and the asynchrony between the concurrent executions of control 
messages sent before a barrier. Otherwise, the second packet that triggers the 
execution of the wrong branch would not have appeared in the buffer before 
the first one had been dealt with by the controller. Furthermore, if all rules in 
messages up to a barrier were installed synchronously, the second packet would 
be dealt with correctly, so no bug could occur. 


Inconsistent Update Bug. OpenFlow's barrier and barrier reply mechanisms 
allow for updating multiple network switches in a way that enables consistent 
packet processing, i.e., a packet cannot see a partially updated network where 
only a subset of switches have changed their forwarding policy in response to 
this packet (or any other event), while others have not done so. MOCS is expres- 
sive enough to capture this behaviour and related bugs. In the topology shown 
in Fig. 7a, let us assume that, by default, switch B drops all packets destined 
to S. Any attempt to reach S through A are examined separately by the con- 
troller and, when granted access, a relevant rule is installed at both switches 
(e.g. allowing all packets from C' destined to S for given source and destination 
ports). Updates must be consistent, therefore the packet cannot be forwarded 
by A and dropped by B. Both switches must have the new rules in place, before 
the packet is forwarded. To do so, the controller, ([32] Appendix B-CP5), upon 
receiving a PacketIn message from the client's switch, sends the relevant rule to 
switch B (FlowMod) along with respective barrier (BarrierReq) and temporar- 
ily stores the packet that triggered this update. Only after receiving BarrierRes 
message from B, the controller will forward the previously stored packet back 
to A along with the relevant rule. This update is consistent and the packet is 
guaranteed to reach S. A (rather common) bug would be one where the con- 
troller installs the rules to both switches and at the same time forwards the 
packet to A. In this case, the packet may end up being dropped by B, if it 
arrives and gets processed before the relevant rule is installed, and therefore the 
invariant CD([drop(pkt, sw)] . ^(pkt.dest = S)), where [drop(pkt, sw)] is a quanti- 
fier that binds dropped packets (see definition in [32] Appendix B-CP5), would 


? Here, we assume that the controller looks up a static forwarding table before sending 
PacketOut messages to switches. 
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be violated. For this example, it is crucial that MOCS supports barrier response 
messages. 


5 Conclusion 


We have shown that an OpenFlow compliant SDN model, with the right opti- 
misations, can be model checked to discover subtle real-world bugs. We proved 
that MOCS can capture real-world bugs in a more complicated semantics with- 
out sacrificing performance. 

But this is not the end of the line. One could automatically compute equiv- 
alence classes of packets that cover all behaviours (where we still computed 
manually). To what extent the size of the topology can be restricted to find 
bugs in a given controller is another interesting research question, as is the anal- 
ysis of the number and length of interleavings necessary to detect certain bugs. 
In our examples, all bugs were found in less than a second. 


References 


1. Al-Fares, M., Radhakrishnan, S., Raghavan, B.: Hedera: dynamic flow scheduling 
for data center networks. In: NSDI (2010) 

2. Al-Shaer, E., Al-Haj, S.: FlowChecker: configuration analysis and verification of 
federated OpenFlow infrastructures. In: SafeConfig (2010) 

3. Albert, E., Gómez-Zamalloa, M., Rubio, A., Sammartino, M., Silva, A.: SDN- 
Actors: modeling and verification of SDN programs. In: Havelund, K., Peleska, J., 
Roscoe, B., de Vink, E. (eds.) FM 2018. LNCS, vol. 10951, pp. 550-567. Springer, 
Cham (2018) 

4. Baier, C., Katoen, J.P.: Principles of Model Checking. The MIT Press, Cambridge 
(2008) 

5. Ball, T., Bjgrner, N., Gember, A., et al: VeriCon: towards verifying controller 
programs in software-defined networks. In: PLDI (2014) 

6. Behrmann, G., David, A., Larsen, K.G., et al.: Developing UPPAAL over 15 years. 
In: Practice and Experience, Software (2011) 

7. Braga, R., Mota, E., Passito, A.: Lightweight DDoS flooding attack detection using 
NOX/OpenFlow. In: LCN (2010) 

8. Canini, M., Venzano, D., Peresíni, P., et al.: A NICE way to test OpenFlow appli- 
cations. In: NSDI (2012) 

9. Cimatti, A., et al.: NuSMV 2: an OpenSource tool for symbolic model checking. 
In: Brinksma, E., Larsen, K.G. (eds.) CAV 2002. LNCS, vol. 2404, pp. 359-364. 
Springer, Heidelberg (2002) 

10. Curtis, A.R., Mogul, J.C., Tourrilhes, J., et al.: DevoF low: scaling flow management 
for high-performance networks. In: SIGCOMM (2011) 

11. Dobrescu, M., Argyraki, K.: Software dataplane verification. In: Communications 
of the ACM (2015) 

12. El-Hassany, A., Tsankov, P., Vanbever, L., Vechev, M.: Network-wide configuration 
synthesis. In: Majumdar, R., Kunéak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 
261-281. Springer, Cham (2017) 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 
25. 


26. 


2T. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


3T. 


Towards Model Checking Real-World Software-Defined Networks 147 


Fayaz, S.K., Sharma, T., Fogel, A., et al.: Efficient network reachability analysis 
using a succinct control plane representation. In: OSDI (2016) 

Fayaz, S.K., Yu, T., Tobioka, Y., et al.: BUZZ: testing context-dependent policies 
in stateful networks. In: NSDI (2016) 

Feamster, N., Rexford, J., Shenker, S., et al.: SDX: A Software-defined Internet 
Exchange. Open Networking Summit (2013) 

Feamster, N., Rexford, J., Zegura, E.: The road to SDN. SIGCOMM Comput. 
Commun. Rev. (2014) 

Fischer, M.J., Ladner, R.E.: Propositional dynamic logic of regular programs. J. 
Comput. Syst. Sci. 18, 194-211 (1979) 

Fogel, A., Fung, S., Angeles, L., et al.: A general approach to network configuration 
analysis. In: NSDI (2015) 

Handigol, N., Seetharaman, S., Flajslik, M., et al.: Plug-n-Serve: load-balancing 
web traffic using OpenFlow. In: SIGCOMM (2009) 

Havelund, K., Pressburger, T.: Model checking JAVA programs using JAVA 
PathFinder. STTT 2, 366-381 (2000) 

Holzmann, G.J.: The model checker SPIN. IEEE Trans. Softw. Eng. 23, 279-295 
(1997) 

Holzmann, G.J., Peled, D.: An improvement in formal verification. In: Hogrefe D., 
Leue S. (eds) Formal Description Techniques VII. IAICT, pp. 197-211. Springer, 
Boston, MA (1995) 

Horn, A., Kheradmand, A., Prasad, M.R.: Delta-net: real-time network verification 
using atoms. In: NSDI (2017) 

Hu, H., Ahn, G.J., Han, W., et al.: Towards a reliable SDN firewall. In: ONS (2014) 
Jackson, D.: Alloy: a lightweight object modelling notation. ACM Trans. Softw. 
Eng. Methodol. 11, 256-290 (2002) 

Jafarian, J.H., Al-Shaer, E., Duan, Q.: OpenFlow random host mutation: transpar- 
ent moving target defense using software defined networking. In: HotSDN (2012) 
Jain, S., Zhu, M., Zolla, J., et al.: B4: experience with a globally-deployed software 
defined WAN. In: SIGCOMM (2013) 

Jia, Y.: NetSMC: a symbolic model checker for stateful network verification. In: 
NSDI (2020) 

Kazemian, P., Chang, M., Zeng, H., et al.: Real time network policy checking using 
header space analysis. In: NSDI (2013) 

Kazemian, P., Varghese, G., McKeown, N.: Header space analysis: static checking 
for networks. In: NSDI (2012) 

Khurshid, A., Zou, X., Zhou, W., et al.: VeriFlow: verifying network-wide invariants 
in real time. In: NSDI (2013) 

Klimis, V., Parisis, G., Reus, B.: Towards model checking real-world software- 
defined networks (version with appendix). preprint arXiv:2004.11988 (2020) 

Li, Y., Yin, X., Wang, Z., et al.: A survey on network verification and testing with 
formal methods: approaches and challenges. IEEE Surv. Tutorials 21, 940-969 
2019) 

Mai, H., Khurshid, A., Agarwal, R., et al.: Debugging the data plane with anteater. 
In: SIGCOMM (2011) 

Majumdar, R., Deep Tetali, S., Wang, Z.: Kuai: a model checker for software- 
defined networks. In: FMCAD (2014) 

McClurg, J., Hojjat, H., Cerny, P., et al.: Efficient synthesis of network updates. 
In: PLDI (2015) 

McKeown, N., Anderson, T., Balakrishnan, H., et al.: OpenFlow: enabling innova- 
tion in campus networks. SIGCOMM Comput. Commun. Rev. 38, 69-74 (2008) 


148 V. Klimis et al. 


38. Patel, P., Bansal, D., Yuan, L., et al.: Ananta: cloud scale load balancing. SIG- 
COMM 43, 207-218 (2013) 

39. Peled, D.: All from one, one for all: on model checking using representatives. In: 
Courcoubetis, C. (ed.) CAV 1993. LNCS, vol. 697, pp. 409-423. Springer, Heidel- 
berg (1993) 

40. Plotkin, G.D., Bjørner, N., Lopes, N.P., et al.: Scaling network verification using 
symmetry and surgery. In: POPL (2016) 

41. Pnueli, A., Xu, J., Zuck, L.: Liveness with (0,1, oc)- counter abstraction. In: 
Brinksma, E., Larsen, K.G. (eds.) CAV 2002. LNCS, vol. 2404, pp. 107-122. 
Springer, Heidelberg (2002) 

42. Pratt, V.R.: Semantical considerations on Floyd-Hoare logic. In: FOCS (1976) 

43. Sethi, D., Narayana, S., Malik, S.: Abstractions for model checking SDN controllers. 
In: FMCAD (2013) 

44. Shenker, S., Casado, M., Koponen, T., et al.: The future of networking, and the 
past of protocols. In: ONS (2011). https://tinyurl.com/yxnuxobt 

45. Son, S., Shin, S., Yegneswaran, V., et al.: Model checking invariant security prop- 
erties in OpenFlow. In: IEEE (2013) 

46. Stoenescu, R., Popovici, M., Negreanu, L., et al.: SymNet: scalable symbolic exe- 
cution for modern networks. In: SIGCOMM (2016) 

4T. Varghese, G.: Vision for network design automation and network verification. In: 
NetPL (Talk) (2018). https://tinyurl.com/y2cnhvhf 

48. Yang, H., Lam, S.S.: Real-time verification of network properties using atomic 
predicates. IEEE/ACM Trans. Network. 24, 887-900 (2016) 

49. Zeng, H., Kazemian, P., Varghese, G., et al.: A survey on network troubleshooting. 
Technical report TR12-HPNG-061012, Stanford University (2012) 

50. Zeng, H., Zhang, S., Ye, F., et al.: Libra: divide and conquer to verify forwarding 
tables in huge networks. In: NSDI (2014) 

51. Zhang, S., Malik, S.: SAT based verification of network data planes. In: Van Hung, 
D., Ogawa, M. (eds.) ATVA 2013. LNCS, vol. 8172, pp. 496-505. Springer, Cham 
(2013) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by /4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the 
chapter's Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter's Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


Software Verification 


Check for 
updates 


Code2Inv: A Deep Learning Framework 
for Program Verification 


Xujie Si! C9, Aaditya Naik!, Hanjun Dai?, Mayur Naik!, and Le Song? 


1 University of Pennsylvania, Philadelphia, USA 
xsi@cis.upenn.edu 
? Google Brain, Mountain View, USA 
3 Georgia Institute of Technology, Atlanta, USA 


Abstract. We propose a general end-to-end deep learning framework 
Code2Inv, which takes a verification task and a proof checker as input, 
and automatically learns a valid proof for the verification task by inter- 
acting with the given checker. Code2Inv is parameterized with an embed- 
ding module and a grammar: the former encodes the verification task 
into numeric vectors while the latter describes the format of solutions 
Code2Inv should produce. We demonstrate the flexibility of Code2Inv by 
means of two small-scale yet expressive instances: a loop invariant syn- 
thesizer for C programs, and a Constrained Horn Clause (CHC) solver. 


1 Introduction 


A central challenge in automating program verification lies in effective proof 
search. Counterexample-guided Inductive Synthesis (CEGIS) [3,4,17,31,32] has 
emerged as a promising paradigm for solving this problem. In this paradigm, a 
generator proposes a candidate solution, and a checker determines whether the 
solution is correct or not; in the latter case, the checker provides a counterex- 
ample to the generator, and the process repeats. 

Finding loop invariants is arguably the most crucial part of proof search 
in program verification. Recent works [2,9,10,26,29,38] have instantiated the 
CEGIS paradigm for synthesizing loop invariants. Since checking loop invariants 
is a relatively standard process, these works target generating loop invariants 
using various approaches, such as stochastic sampling [29], syntax-guided enu- 
meration [2,26], and decision trees with templates [9,10] or linear classifiers [38]. 
Despite having greatly advanced the state-of-the-art in program verification, 
however, there remains significant room for improvement in practice. 

We set out to build a CEGIS-based program verification framework and iden- 
tified five key objectives that it must address to be useful: 


— The proof search should automatically evolve according to a given verification 
task as opposed to using exhaustive enumeration or a fixed set of search 
heuristics common in existing approaches. 
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— The framework should be able to transfer knowledge across programs, that is, 
past runs should boost performance on similar programs in the future, which 
is especially relevant for CI/CD settings [15, 20, 25]. 

— The framework should be able to adapt to generate different kinds of invari- 
ants (e.g. non-linear or with quantifiers) beyond linear invariants predomi- 
nantly targeted by existing approaches. 

— The framework should be extensible to a new domain (e.g. constraint solving- 
based) by simply switching the underlying checker. 

— The generated invariants should be natural, e.g. avoid overfitting due to 
human-induced biases in the proof search heuristic or invariant structure 
commonly imposed through templates. 


We present Code2Inv, an end-to-end deep learning framework which aims 
to realize the above objectives. Code2Inv has two key differences compared to 
existing CEGIS-based approaches. First, instead of simply focusing on counterex- 
amples but ignoring program structure, Code2Inv learns a neural representation 
of program structure by leveraging graph neural networks [8,11,19,28], which 
enable to capture structural information and thereby generalize to different but 
structurally similar programs. Secondly, Code2Inv reduces loop invariant gener- 
ation into a deep reinforcement learning problem [22,34]. No search heuristics or 
training labels are needed from human experts; instead, a neural policy for loop 
invariant generation can be automatically learned by interacting with the given 
proof checker on the fly. The learnable neural policy generates a loop invariant 
by taking a sequence of actions, which can be flexibly controlled by a grammar 
that defines the structure of loop invariants. This decoupling of the action defini- 
tion from policy learning enables Code2Inv to adapt to different loop invariants 
or other reasoning tasks in a new domain with almost no changes except for 
adjusting the grammar or the underlying checker. 

We summarize our contributions as follows: 


— We present a framework for program verification, Code2Inv, which leverages 
deep learning and reinforcement learning through the use of graph neural net- 
work, tree-structured long short-term memory network, attention mechanism, 
and policy gradient. 

— We show two small-scale yet expressive instances of Code2Inv: a loop invariant 
synthesizer for C programs and a Constrained Horn Clause (CHC) solver. 

— We evaluate Code2Inv on a suite of 133 C programs from SyGuS [2] by com- 
paring its performance with three state-of-the-art approaches and showing 
that the learned neural policy can be transferred to similar programs. 

— We perform two case studies showing the flexibility of Code2Inv on different 
classes of loop invariants. We also perform a case study on the naturalness of 
the loop invariants generated by various approaches. 


2 Background 


In this section, we introduce artificial neural network concepts used by Code2Inv. 
A multilayer perceptron (MLP) is a basic neural network model which can 
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approximate an arbitrary continuous function y = f*(x), where x and y 
are numeric vectors. An MLP defines a mapping y = /(x;0), where 8 
denotes weights of connections, which are usually trained using gradient descent 
methods. 

Recurrent neural networks (RNNs) approximate the mapping from a 
sequence of inputs x),...,x to either a single output y or a sequence of 
outputs y 0, ..., y (0. An RNN defines a mapping h® = f (h*-U, x: 9), where 
h(? is the hidden state, from which the final output y“ can be computed (e.g. 
by a non-linear transformation or an MLP). A common RNN model is the long 
short-term memory network (LSTM) [16] which is used to learn long-term depen- 
dencies. Two common variants of LSTM are gated recurrent units (GRUs) [7] 
and tree-structured LSTM (Tree-LSTM) [35]. The former simplifies the LSTM 
for efficiency while the latter extends the modeling ability to tree structures. 

In many domains, graphs are used to represent data with rich structure, 
such as programs, molecules, social networks, and knowledge bases. Graph neu- 
ral networks (GNNs) [1,8,11,19,36] are commonly used to learn over graph- 
structured data. A GNN learns an embedding (i.e. real-valued vector) for each 
node of the given graph using a recursive neighborhood aggregation (or neu- 
ral message passing) procedure. After training, a node embedding captures the 
structural information within the node's K-hop neighborhood, where K is a 
hyper-parameter. A simple aggregation of all node embeddings or pooling [37] 
according to the graph structure summarizes the entire graph into an embed- 
ding. GNNs are parametrized with other models such as MLPs, which are the 
learnable non-linear transformations used in message passing, and GRUs, which 
are used to update the node embedding. 

Lastly, the generalization ability of neural networks can be improved by an 
external memory [12,13,33] which can be accessed using a differentiable atten- 
tion mechanism |5]. Given a set of neural embeddings, which form the external 
memory, an attention mechanism assigns a likelihood to each embedding, under 
a given neural context. These likelihoods guide the selection of decisions that 
are represented by the chosen embeddings. 


3 Framework 


We first describe the general framework, Code2Inv, and then illustrate two 
instances, namely, a loop invariant synthesizer for C programs and a CHC solver. 

Figure 1 defines the domains of program structures and neural structures used 
in Code2Inv. The framework is parameterized by graph constructors G that pro- 
duce graph representations of verification instance T' and invariant grammar A, 
denoted Ginst and Giny, respectively. The invariant grammar uses placeholder 
symbols H, which represent abstract values of entities such as variables, con- 
stants, and operators, and will be replaced by concrete values from the verifica- 
tion instance during invariant generation. The framework requires a black-box 
function check that takes a verification instance 7' and a candidate invariant 
inv, and returns success (denoted L) or a counterexample cez. 
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Domains of Program Structures: 


G(T) = Ginst Ginst is graph representation of verification instance T) 
G(A) = Gi Ginv is graph representation of invariant grammar A) 
A = (SWH,N,P,S) invariant grammar) 
rc HWN set of placeholder symbols and non-terminals) 
vex set of terminals) 
n eN set of non-terminals) 
pcP production rule) 
S start symbol) 
inv € L(A) invariant candidate) 
cer € C counterexample) 
C e P(C) set of counterexamples) 
check(T,inv) € (1) 8C invariant validation) 


Domains of Neural Structures: 


t= (vr, VA, NT, NA, Actx; €inv) 


neural policy) 
positive integer size of embedding) 


vr, nv(Ginst) € RIGinse| x4 graph embedding of verification instance) 
va, nA(Ginv) € R'Ginvlx4 graph embedding of invariant grammar) 
ctz € R? neural context) 
state € R? partially generated invariant state) 
Qetx E RÊ x R? — R? (attention context) 
&w € L(A) > R7 invariant encoder) 
aggregate € RF*4 — R? aggregation of embeddings) 
valn] e mR**? embedding of production rules for non-terminal n, 
where k is number of production rules of n in Ginyv) 
vp[h] e R=”? embedding of nodes annotated by placeholder h, 


where k is number of nodes annotated by h in Ginst) 


Fig. 1. Semantic domains. £(A) denotes the set of all sentential forms of A. 


The key component of the framework is a neural policy 7 which comprises 
four neural networks. Two graph neural networks, np and rA, are used to com- 
pute neural embeddings, vy and va, for graph representations Ginst and Giny, 
respectively. The neural network atx, implemented as a GRU, maintains the 
attention context ctz which controls the selection of the production rule to apply 
or the concrete value to replace a placeholder symbol at each step of invariant 
generation. The neural network €inv, implemented as a Tree-LSTM, encodes the 
partially generated invariant into a numeric vector denoted state, which captures 
the state of the generation that is used to update the attention context ctz. 

Algorithm 1 depicts the main algorithm underlying Code2Inv. It takes a 
verification instance and a proof checker as input and produces an invariant 
that suffices to verify the given instance’. At a high level, Code2Inv learns a 
neural policy, in lines 1-5. The algorithm first initializes the neural policy and 
the set of counterexamples (line 1-2). The algorithm then iteratively samples a 
candidate invariant (line 4) and improves the policy using a reward for the new 


1 Fuzzers may be applied first so that the confidence of existence of a proof is high. 
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Algorithm 1. Code2Inv Framework 
Input: a verification instance T and a proof checker check 
Output: a invariant inv satisfying check(T, inv) = L 
Parameter: graph constructor G and invariant grammar A 
i m — initPolicy(T, A) 
2C- 
3 while true do 
4 inv — sample(7,T, A) 
5 (1, C) —improve (r, inv, C) 
e Function initPolicy(T, A) 
7 Initialize weights of rjr, NA, Qctx; €iny With random values 
8 vr — nr(G(T)) 
9 VA — na(G(A)) 
10 L return (vT, VA, NT: NA, Qctx, €inv) 
11 Function sample(z, T, A) 
12 inv — A.S 
18 ctx — aggregate(m.v y) 
14 while inv is partially derived do 
15 x — leftmost non-terminal or placeholder symbol in inv 
16 state — 7.€iny (inv) 
17 ctx — T.Actx(cta, state) 
18 if r is non-terminal then 
19 p — attention (ctx, 7.v, [x], G(A)) 
20 | expand inv according to p 
21 else 
22 v — attention (ctx, t.vrle], G(T) 
23 | replace x in inv with v 
24 return inv 
25 Function improveCr,inv,C) 
26 n — number of counter-examples C that inv can satisfy 
27 if n = |C| then 
28 cex — check(T, inv) 
29 if cex = L then 
30 save inv and weights of 7 
31 exit // a sufficient invariant is found 
32 else 
33 | C — CU {cer} 
34 r — n/|C| 
35 T — updatePolicy(z,r) 
36 return (7,C) 
37 Function updatePolicy(z,r) 
38 Update weights of m.n7,7.NA,7-Qctx; T-€iny, TUT, T.VA by 
39 standard policy gradient [34] using reward r 
ao Function attention(ctz,v,G) 
41 Return node t in G such that dot product of ctx and v[t] 
42 is maximum over all nodes of G 
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candidate based on the accumulated counterexamples (line 5). We next elucidate 
upon the initialization, policy sampling, and policy improvement procedures. 


Initialization. The initPolicy procedure (line 6-10) initializes the neural pol- 
icy. All four neural networks are initialized with random weights (line 7), and 
graph embeddings vr, va for verification task T and invariant grammar A are 
computed by applying corresponding graph neural networks rjr, na to their graph 
representations G(T), G(A) respectively. Alternatively, the neural networks can 
be initialized with pre-trained weights, which can boost overall performance. 


Neural Policy Sampling. The sample procedure (lines 11-24) generates a 
candidate invariant by executing the current neural policy. The candidate is 
first initialized to the start symbol of the given grammar (line 12), and then 
updated iteratively (lines 14-23) until it is complete (i.e. there are no non- 
terminals). Specifically, the candidate is updated by either expanding its leftmost 
non-terminal according to one of its production rules (lines 19-20) or by replacing 
its leftmost placeholder symbol with some concrete value from the verification 
instance (lines 22-23). The selection of a production rule or concrete value is done 
through an attention mechanism, which picks the most likely one according to 
the current context and corresponding region of external memory. The neural 
context is initialized to the aggregation of embeddings of the given verification 
instance (line 13), and then maintained by atx (line 17) which, at each step, 
incorporates the neural state of the partially generated candidate invariant (line 
16), where the neural state is encoded by eis. 


Neural Policy Improvement. The improve procedure (lines 25-36) improves 
the current policy by means of a continuous reward. Simply checking whether 
the current candidate invariant is sufficient or not yields a discrete reward of 
1 (yes) or 0 (no). This reward is too sparse to improve the policy, since most 
candidate invariants generated are insufficient, thereby almost always yielding 
a zero reward. Code2Inv addresses this problem by accumulating counterexam- 
ples provided by the checker. Whenever a new candidate invariant is generated, 
Code2Inv tests the number of counterexamples it can satisfy (line 26), and uses 
the fraction of satisfied counterexamples as the reward (line 34). If all counterex- 
amples are satisfied, Code2Inv queries the checker to validate the candidate (line 
28). If the candidate is accepted by the checker, then a sufficient invariant was 
found, and the learned weights of the neural networks are saved for speeding 
up similar verification instances in the future (lines 29-31). Otherwise, a new 
counterexample is accumulated (line 33). Finally, the neural policy (including 
the neural embeddings) is updated based on the reward. 


Framework Instantiations. We next show two instantiations of Code2Inv by 
customizing the graph constructor G. Specifically, we demonstrate two scenarios 
of graph construction: 1) by carefully exploiting task specific knowledge, and 2) 
with minimum information of the given task. 
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Xi $(Xs, X2) 
yi $(yo, Y2) 
while (y, < 1000) { 
X;-X,* y, 
yy = ya ck d 


(a) 


Fig. 2. (a) C program snippet in SSA form; (b) its graph representation. 


Instantiation to Synthesize Loop Invariants for C Programs. An effective graph 
representation for a C program should reflect its control-flow and data-flow infor- 
mation. We leverage the static single assignment (SSA) transformation for this 
purpose. Figure 2 illustrates the graph construction process. Given a C program, 
we first apply SSA transformation as shown in Fig. 2a, from which a graph is 
constructed as shown in Fig.2b. The graph is essentially abstract syntax trees 
(ASTs) augmented with control-flow (black dashed) edges and data-flow (blue 
dashed) edges. Different types of edges will be modeled as different message pass- 
ing channels used in graph neural networks so that rich structural information 
can be captured more effectively by the neural embeddings. Furthermore, certain 
nodes (marked black) are annotated with placeholder symbols and will be used 
to fill corresponding placeholders during invariant generation. For instance, vari- 
ables x and y are annotated with VAR, integer values 1000 and 1 are annotated 
with CONST, and the operator < is annotated with OP. 


(set-logic HORN) itp-v1 S = C 
declare-rel i 
bibis are-rel itp (Int Int)) S-»C&&S [ s-p1 ] [ cpi ] 
(rule (=> (and (itp D C) S => C || S 
"9 emp UE eme 
(itp B A))) Ep E => VAR [ s-p3 | 
(a) b) (c) (d) 


Fig.3. (a) CHC instance snippet; (b) node representation for the CHC example; (c) 
example of invariant grammar; (d) node representation for the grammar. 


Instantiation to Solve Constrained Horn Clauses (CHC). CHC are a uniform way 
to represent recursive, inter-procedural, and multi-threaded programs, and serve 
as a suitable basis for automatic program verification [6] and refinement type 
inference [21]. Solving a CHC instance involves determining unknown predicates 
that satisfy a set of logical constraints. Figure3a shows a simple example of a 
CHC instance where itp is the unknown predicate. It is easy to see that itp in 
fact represents an invariant of a loop. Thus, CHC solving can be viewed as a 
generalization of finding loop invariants [6]. 
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Unlike C programs, which have explicit control-flow and data-flow informa- 
tion, a CHC instance is a set of un-ordered Horn rules. The graph construction for 
Horn rules is not as obvious as for C programs. Therefore, instead of deliberately 
constructing a graph that incorporates detailed domain-specific information, we 
use a node representation, which is a degenerate case of graph representation and 
requires only necessary nodes but no edges. Figure 3b shows the node represen- 
tation for the CHC example from Fig. 3a. The top two nodes are derived from 
the signature of unknown predicate itp and represent the first and the second 
arguments of itp. The bottom two nodes are constants extracted from the Horn 
rule. We empirically show that node representation works reasonably well. The 
downside of node representation is that no structural information is captured 
by the neural embeddings which in turn prevents the learned neural policy from 
generalizing to other structurally similar instances. 


Embedding Invariant Grammar. Lastly, both instantiations must define the 
embedding of the invariant grammar. The grammar can be arbitrarily defined, 
and similar to CHCs, there is no obvious information such as control- or data- 
flow to leverage. Thus, we use node representation for the invariant grammar 
as well. Figure 3c and Fig. 3d shows an example of invariant grammar and its 
node representation, respectively. Each node in the graph represents either a 
terminal or a production rule for a non-terminal. Note that this representation 
does not prevent the neural policy from generalizing to similar instances as long 
as they share the same invariant grammar. This is feasible because the invariant 
grammar does not contain instance specific details, which are abstracted away 
by placeholder symbols like VAR, CONST, and OP. 


4 Evaluation 


We first discuss the implementation, particularly the improvement over our pre- 
vious prototype [30], and then evaluate our framework in a number of aspects, 
such as performance, transferability, flexibility, and naturalness. 


Implementation. Code2Inv? consists of a frontend, which converts an instance 
into a graph, and a backend, which maintains all neural components (i.e. neural 
embeddings and policy) and interacts with a checker. Our previous prototype has 
a very limited frontend based on CIL [24] and no notion of invariant grammar 
in the backend. We made significant improvements in both the frontend and the 
backend. We re-implemented the frontend for C programs based on Clang and 
implemented a new frontend for CHCs. We also re-implemented the backend to 
accept a configurable invariant grammar. Furthermore, we developed a standard 
graph format, which decouples the frontend and backend, and a clean interface 
between the backend and the checker. No changes are needed in the backend to 
support new instantiations. 


Evaluation Setup. We evaluate both instantiations of Code2Inv by comparing 
each instantiation with corresponding state-of-the-art solvers. For the task of 


? Our artifacts are available on GitHub: https://github.com/PL-ML/code2inv. 
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synthesizing loop invariants for C programs, we use the same suite of benchmarks 
from our previous work [30], which consists of 133 C programs from SyGuS [2]. 
We compare Code2Inv with our previous specialized prototype and three other 
state-of-the-art verification tools: C2I [29], LoopInvGen [26] and ICE-DT [10]. 
For the CHC solving task, we collect 120 CHC instances using SeaHorn [14] to 
reduce the C benchmark programs into CHCs.? We compare Code2Inv with two 
state-of-the-art CHC solvers: Spacer [18], which is the default fixedpoint engine 
of Z3, and LinearyArbitrary [38]. We run all solvers on a single 2.4 GHz AMD 
CPU core up to 12h and using up to 4 GB memory. Unless specified otherwise, 
Code2Inv is always initialized randomly, that is, untrained. 


Performance. Given that both the hardware and the software environments 
could affect the absolute running time and that all solvers for loop invariant 
generation for C programs rely on the same underlying SMT engine, Z3 [23], 
we compare the performance in terms of number of Z3 queries. We note that 
this is an imperfect metric but a relatively objective one that also highlights 
salient features of Code2Inv. Figure 4a shows the plot of verification cost (i.e. 
number of Z3 queries) by each solver and the number of C programs success- 
fully verified within the corresponding cost. Code2Inv significantly outperforms 
other state-of-the-art solvers in terms of verification cost and the general frame- 
work Code2Inv-G achieves performance comparable to (slightly better than) the 
previous specialized prototype Code2Inv-S. 


== Cal — untrained 
10? + —— PIE — pre-trained 
— |CE-DT 
E! —— Code2Inv-S o 
9 10?} —— Code2Inv-G 9 
= = 
o o 
em m 
N N 
* 101 + 
10° 
O 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 600 700 
# instances solved # instances solved 
(a) (b) 


Fig. 4. (a) Comparison of Code2Inv with state-of-the-art solvers; (b) comparison 
between untrained model and pre-trained model. 


Transferability. Another hallmark of Code2Inv is that, along with the desired 
loop invariant, it also learns a neural policy. To evaluate the performance ben- 
efits of the learned policy, we randomly perturb the C benchmark programs by 
various edits (e.g. renaming existing variables and injecting new variables and 


3 SeaHorn produces empty Horn rules on 13 (out of 133) C programs due to optimiza- 
tions during VC generation that result in proving the assertions of interest. 
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statements). For each program, we obtain 100 variants, and use 90 for train- 
ing and 10 for testing. Figure 4b shows the performance difference between the 
untrained model (i.e. initialized with random weights) and the pre-trained model 
(i.e. initialized with pre-trained weights). Our results indicate that the learned 
neural policy can be transferred to accelerate the search for loop invariants for 
similar programs. This is especially useful in the CI/CD setting [25] where pro- 
grams evolve incrementally and quick turnaround time is indispensable. 


Flexibility. Code2Inv can be instantiated or extended in a very flexible manner. 
For one instance, with a simple frontend (e.g. node representation as discussed 
above), Code2Inv can be customized as a CHC solver. Our evaluation shows 
that, without any prior knowledge about Horn rules, Code2Inv can solve 94 
(out of 120) CHC instances. Although it is not on a par with state-of-the-art 
CHC solvers Spacer and LinearArbitrary, which solve 112 and 118 instances, 
respectively, Code2Inv provides new insights for solving CHCs and could be 
further improved by better embeddings and reward design. 

As another example, by simply adjusting the invariant grammar, Code2Inv 
is immediately ready for solving CHC tasks involving non-linear arithmetic. 
Our case study shows that Code2Inv successfully solves 5 (out of 7) non-linear 
instances we created’, while both Spacer and LinearArbitrary failed to solve 
any of them. Tasks involving non-linear arithmetic are particularly challenging 
because the underlying checker is more likely to get stuck, and no feedback 
(e.g. counterexample) can be provided, which is critical for existing solvers like 
Spacer and LinearArbitrary to make progress. This highlights another strength of 
Code2Inv—even if the checker gets stuck, the learning process can still continue 
by simply assigning zero or negative reward. 


Solution found by Spacer: Solution found by LinearArbitrary: 
(and (or (not (<= B 16)) (not (>= A 8))) (or 
(not (<= B 0)) (and true !(V0«--50) 
(or (not (<= B 2)) (<= A 0)) Vi«-5  ((1«V0)*(-1*V1))«--45 
(or (not (<= B 4)) (not (>= A 2))) Vi«-4 !(((1*V0)*(-1*V1))«4--51) 
(or (not (<= B 6)) (not (>= A 3))) 1(V1«-2)! (((1*V0) + (-1*V1) ) <=-50) 
(or (not (<= B 8)) (not (>= A 4))) !(V1<=3) ((1*VO)+(1*V1))<=-40 
(or (not (<= B 10)) (not (>= A 5))) ) 
(or (not (<= B 12)) (not (>= A 6))) ... // omitting other 4 similar (and ...) 
(or (not (<= B 14)) (not (>= A 7))))))) ) 
Code2Inv: (<= vO (- vi v0)) Code2Inv: (or (« VO (+ © 0)) (> V1 V0)) 
(a) Spacer on add2.smt (b) LinearArbitrary on 84.c.smt 


Fig. 5. Comparison of solution naturalness. 


Naturalness. Our final case study concerns the naturalness of solutions. As 
illustrated in Fig.5, solutions discovered by Code2Inv tend to be more nat- 
ural, whereas Spacer and LinearArbitrary tend to find solutions that unnec- 
essarily depend on constants from the given verification instance. Such over- 
fitted solutions may become invalid when these constants change. Note that 


^ The non-linear instances we created are available in the artifact. 
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expressions such as (+ 0 0) in Code2Inv’s solutions can be eliminated by post- 
processing simplification akin to peephole optimization in compilers. Alterna- 
tively, the reward mechanism in Code2Inv could incorporate a regularizer on the 
naturalness. 


Limitations. Code2Inv does not support finding loop invariants for programs 
with multiple loops, function calls, or recursion. Code2Inv generally runs slower 
compared to other contemporary approaches. Specifically, 90% of the solved C 
instances took 2h or less, and the rest could take up to 12 hours to solve. This 
could be improved upon by leveraging GPUs, developing more efficient training 
algorithms, or leveraging templates [27]. 


5 Conclusion 


We presented a framework Code2Inv which automatically learns invariants 
(or more generally unknown predicates) by interacting with a proof checker. 
Code2Inv is a general and learnable tool for solving many different verification 
tasks and can be flexibly configured with a grammar and a graph constructor. 
We compared its performance with state-of-the-art solvers for both C programs 
and CHC formulae, and showed that it can adapt to different types of inputs 
with minor changes. We also showed, by simply varying the input grammar, how 
it can tackle non-linear invariant problems which other solvers are not equipped 
to work with, while still giving results that are relatively natural to read. 
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Abstract. Witness validation is an important technique to increase trust 
in verification results, by making descriptions of error paths (violation 
witnesses) and important parts of the correctness proof (correctness wit- 
nesses) available in an exchangeable format. This way, the verification 
result can be validated independently from the verification in a second 
step. The problem is that there are unfortunately not many tools avail- 
able for witness-based validation of verification results. We contribute to 
closing this gap with the approach of validation via verification, which is 
a way to automatically construct a set of validators from a set of existing 
verification engines. The idea is to take as input a specification, a program, 
and a verification witness, and produce a new specification and a trans- 
formed version of the original program such that the transformed program 
satisfies the new specification if the witness is useful to confirm the result 
of the verification. Then, an ‘off-the-shelf’ verifier can be used to validate 
the previously computed result (as witnessed by the verification witness) 
via an ordinary verification task. We have implemented our approach in 
the validator MrErAVarL, and it was successfully used in SV-COMP 2020 
and confirmed 3 653 violation witnesses and 16 376 correctness witnesses. 
The results show that MgraVar improves the effectiveness (167 uniquely 
confirmed violation witnesses and 833 uniquely confirmed correctness 
witnesses) of the overall validation process, on a large benchmark set. All 
components and experimental data are publicly available. 


Keywords: Computer-aided verification - Software verification - Program 
analysis - Software model checking - Certification - Verification witnesses - 
Validation of verification results - Reducer 


1 Introduction 


Formalsoftware verification becomes more and more important in the development 
process for software systems of all types. There are many verification tools 
available to perform verification [4]. One of the open problems that was addressed 
only recently is the topic of results validation [10-12,37]: The verification 
work is often done by untrusted verification engines, on untrusted computing 
infrastructure, or even on approximating computation systems, and static-analysis 
tools suffer from false positives that engineers in practice hate because they are 
tedious to refute [20]. Therefore, it is necessary to validate verification results, 


This work was funded by the Deutsche Forschungsgemeinschaft (DFG) — 378803395. 
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ideally by an independent verification engine that likely does not have the same 
weaknesses as the original verifier. Witnesses also help serving as an interface to 
the verification engine, in order to overcome integration problems [1]. 

The idea to witness the correctness of a program by annotating it with 
assertions is as old as programming [38], and from the beginning of model checking 
it was felt necessary to witness counterexamples [21]. Certifying algorithms [30] 
are not only computing a solution but also produce a witness that can be used by 
a computationally much less expensive checker to (re-)establish the correctness 
of the solution. In software verification, witnesses became standardized! and 
exchangeable about five years ago [10,11]. In the meanwhile, the exchangeable 
witnesses can be used also for deriving tests from witnesses [12], such that an 
engineer can study an error report additionally with a debugger. The ultimate 
goal of this direction of research is to obtain witnesses that are certificates and 
can be checked by a fully trusted validator based on trusted theorem provers, 
such as Coq and Isabelle, as done already for computational models that are 
'easier' than C programs [40]. 

Yet, although considered very useful, there are not many witness validators 
available. For example, the most recent competition on software verification 
(SV-COMP 2020)? showcases 28 software verifiers but only 6 witness validators. 
Two were published in 2015 [11], two more in 2018 [12], the fifth in 2020 [37], and 
the sixth is METAVAL, which we describe here. Witness validation is an interesting 
problem to work on, and there is a large, yet unexplored field of opportunities. It 
involves many different techniques from program analysis and model checking. 
However, it seems that this also requires a lot of engineering effort. 

Our solution validation via verification is a construction that takes as input 
an off-the-shelf software verifier and a new program transformer, and composes a 
witness validator in the following way (see Fig. 1): First, the transformer takes the 
original input program and transforms it into a new program. In case of a violation 
witness, which describes a path through the program to a specific program location, 
we transform the program such that all parts that are marked as unnecessary 
for the path by the witness are pruned. This is similar to the reducer for a 
condition in reducer-based conditional model checking [14]. In case of a correctness 
witness, which describes invariants that can be used in a correctness proof, we 
transform the program such that the invariants are asserted (to check that they 
really hold) and assumed (to use them in a re-constructed correctness proof). 
A standard verification engine is then asked to verify that (1) the transformed 
program contains a feasible path that violates the original specification (violation 
witness) or (2) the transformed program satisfies the original specification and 
all assertions added to the program hold (correctness witness). 

METAVAL is an implementation of this concept. It performs the transformation 
according to the witness type and specification, and can be configured to use 
any of the available software verifiers? as verification backend. 


! Latest version of standardized witness format: https://github.com/sosy-lab/sv-witnesses 
? https:/ /sv-comp.sosy-lab.org/2020/systems.php 
3 https://gitlab.com/sosy-lab/sv-comp/archives-2020/tree/master/2020 
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Validator 
Program Pe EN NENTRERNEN TRUE 
| Program’ E 
Witness ~~~ o’ Transformer Verifier ~> UNKNOWN 
; Specification’ A B. 
Specification | | FALSE 
| CPACHECKER ~] } 
| SYMBIOTIC ~~~ ' 
| ULTIMATE 
i AUTOMIZER 


Fig. 1. Validator construction using readily available verifiers 


Contributions. METAVAL contributes several important benefits: 


e The program transformer was a one-time effort and is available from now on. 

e Any existing standard verifier can be used as verification backend. 

e Once a new verification technology becomes available in a verification tool, it 
can immediately be turned into a validator using our new construction. 

e Technology bias can be avoided by complementing the verifier by a validator 
that is based on a different technology. 

e Selecting the strongest verifiers (e.g., by looking at competition results) can 
lead to strong validators. 

e All data and software that we describe are publicly available (see Sect. 6). 


2 Preliminaries 


For the theoretical part, we will have to set a common ground for the concepts 
of verification witnesses [10,11] as well as reducers [14]. In both cases, programs 
are represented as control-flow automata (CFAs). A control-flow automaton 
C = (L,lo, G) consists of a set L of control locations, an initial location Ip € L, 
and a set G C L x Ops x L of control-flow edges that are labeled with the 
operations in the program. In the mentioned literature on witnesses and reducers, 
a simple programming language is used in which operations are either assignments 
or assumptions over integer variables. Operations op € Ops in such a language 
can be represented by formulas in first order logic over the sets V,V' of program 
variables before and after the transition, which we denote by op(V, V"). In order to 
simplify our construction later on, we will also allow mixed operations of the form 
f(V) ^ (z' = g(V)) that combine assumptions with an assignment, which would 
otherwise be represented as an assumption followed by an assignment operation. 
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void fun(uint x, uint y, uint z) ( 
if (x > y) { 
Z = 2*x-y; | 


} else { 
Z = 2xy-x41; 
} x>y x<=y 
if (z»y || z»x) í o [9 
return; z—2*x-y; 7=2*y-x+1; 
} else { 
error (); z>x||z>y, \(z>x||z>y) 
l © X9 
} 
Fig. 2. Example program for both correctness Fig. 3. CFA C of example program 
and violation witness validation from Fig. 2 


oOo 0 4 O O0 BR QM na 


RoR BR 
Nu e o 


The conversion from the source code into a CFA and vice versa is straight 
forward, provided that the CFA is deterministic. A CFA is called deterministic if 
in case there are multiple outgoing CFA edges from a location l, the assumptions 
in those edges are mutually exclusive (but not necessarily exhaustive). 

Since our goal is to validate (i.e., prove or falsify) the statement that a program 
fulfills a certain specification, we need to additionally model the property to 
be verified. For properties that can be translated into non-reachability, this can 
be done by defining a set T C L of target locations that shall not be reached. 
For the example program in Fig.2 we want to verify that the call in line 10 
is not reachable. In the corresponding CFA in Fig. 3 this is represented by the 
reachability of the location labeled with 10. Depending on whether or not a 
verifier accounts for the overflow in this example program, it will either consider 
the program safe or unsafe, which makes it a perfect example that can be used 
to illustrate both correctness and violation witnesses. 

In order to reason about the soundness of our approach, we need to also 
formalize the program semantics. This is done using the concept of concrete 
data states. A concrete data state is a mapping from the set V of program 
variables to their domain Z, and a concrete state is a pair of control location 
and concrete data state. A concrete program path is then defined as a sequence 
m = (colo “ ... £5 (en, In) where co is the initial concrete data state, 
gi = (li-1,0pi,l) € G, and cji: 1(V), ci(V") F opi. A concrete execution ex(m) is 
then derived from a path m by only looking at the sequence (co, lo)... (cs, 0) 
of concrete states from the path. Note the we deviate here from the definition 
given in [14], where concrete executions do not contain information about the 
program locations. This is necessary here since we want to reason about the 
concrete executions that fulfill a given non-reachability specification, i.e., that 
never reach certain locations in the original program. 

Witnesses are formalized using the concept of protocol automata [11]. A proto- 
col automaton W = (Q, X, ô, qo, F) consists of a set Q of states, a set of transition 
labels X = 2€ x @, a transition relation 6 C Q x X x Q, an initial state qo, and 
aset F C Q of final states. A state is a pair that consists of a name to identify 
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the state and a predicate over the program variables V to represent the state 
invariant.^ A transition label is a pair that consists of a subset of control-flow 
edges and a predicate over the program variables V to represent the guard 
condition for the transition to be taken. An observer automaton [11,13,32,34, 36] 
is a protocol automaton that does not restrict the state space, i.e., if for each 
state q € Q the disjunction of the guard conditions of all outgoing transitions is 
a tautology. Violation witnesses are represented by protocol automata in which 
all state invariants are true. Correctness witnesses are represented by observer 
automata in which the set of final states is empty. 


3 Approach 


3.1 From Witnesses to Programs 


When given a CFA C=(L,lo,G), a specification TCL, and a witness 
automaton W =(Q, X,ô,qo, F), we can construct a product automaton 
Acxw = (L x Q, (lo, qo), I, T x F) where Il € (L x Q) x (Ops x 9) x (L x Q). 
The new transition relation I” is defined by allowing for each transition g in the 
CFA only those transitions (S, p) from the witness where g € S holds: 


T= (u (li, qi), (op, v) (l5, qj) ) | AS: (qi (8, p), qj) € ô, (li, op, lj Le S} 


We can now define the semantics of a witness by looking at the paths 
in the product automaton and mapping them to concrete executions in 
the original program. A path of the product automaton Ac w is a se- 


On—1 


quence (lo, qo) ^» ... ==> (In, qn) such that ((l;, qi), ai, (lita, G41)) € T and 
ai = (opi, di). 

It is evident that the automaton Acxw can easily be mapped to a new 
program Cow by reducing the pair (op,y) in its transition relation to an 
operation op. In case op is a pure assumption of the form f(V) then op will 
simply be f(V) A (V). If op is an assignment of the form f(V) ^ (z' = g(V)), 
then op will be (f(V) Av(V)) A (z' = g(V)). This construction has the drawback 
that the resulting CFA might be non-deterministic, but this is actually not 
a problem when the corresponding program is only used for verification. The 
non-determinism can be expressed in the source code by using non-deterministic 
values, which are already formalized by the community and established in the 
SV-COMP rules, and therefore also supported by all participating verifiers. The 
concrete executions of Cox w can be identified with concrete executions of C by 
projecting their pairs (I, q) on their first element. Let projc(ex(Ccxw)) denote 
the set of concrete executions that is derived this way. Due to how the relation T 
of Acyw is constructed, it is guaranteed that this is a subset of the executions 
of C, i.e., projc(ex(Cosw)) € ex(C). In this respect the witness acts in very 
much the same way as a reducer [14], and the reduction of the search space is 
also one of the desired properties of a validator for violation witnesses. 


^ These invariants are the central piece of information in correctness witnesses. While 
invariants that proof a program correct can be hard to come up with, they are usually 
easier to check. 
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v 
| 
þa 
[{x>y},T ay ee 
Opo 
Z=2*x-y;,T 


z=2*y-x+1;,T 
J {2=2*x-y;}, T 
7, do 
o/w 
z>x||z>y,T 

| Ce>yllz>x)},T \(a>x||z>y),T z»x|z»y,T \(z>x||z>y),T 

(Wace) (Bm) E) o) 

Fig. 4. Violation witness Wy Fig. 5. Product automaton Acx wy, 


3.2 Programs from Violation Witnesses 


For explaining the validation of results based on a violation witness, we consider 
the witness in Fig. 4 for our example C program in Fig. 2. The program Cex wy, 
resulting from product automaton Acxw,, in Fig. 5 can be passed to a verifier. 
If this verification finds an execution that reaches a specification violation, then 
this violation is guaranteed to be also present in the original program. There 
is however one caveat: In the example in Fig.5, a reachable state (10,qo) at 
program location 10 (i.e., a state that violates the specification) can be found 
that is not marked as accepting state in the witness automaton Wy. For a strict 
version of witness validation, we can remove all states that are in T x Q but not 
in T x F from the product automaton, and thus, from the generated program. 
This will ensure that if the verifier finds a violation in the generated program, the 
witness automaton also accepts the found error path. The version of METAVAL 
that was used in SV-COMP 2020 did not yet support strict witness validation. 


3.3 Programs from Correctness Witnesses 


Correctness witnesses are represented by observer automata. Figure6 shows a 

potential correctness witness Wc for our example program C in Fig. 2, where 

the invariants are annotated in bold font next to the corresponding state. The 

construction of the product automaton Acxw, in Fig. 7 is a first step towards 

reestablishing the proof of correctness: the product states tell us to which control 

locations of the CFA for the program the invariants from the witness belong. 
The idea of a result validator for correctness witnesses is to 


1. check the invariants in the witness and 
2. use the invariants to establish that the original specification holds. 


We can achieve the second goal by extracting the invariants from each state in the 
product automaton Acxw, and adding them as conditions to all edges by which 
the state can be reached. This will then be semantically equivalent to assuming 
that the invariants hold at the state and potentially make the consecutive proof 
easier. For soundness we need to also ensure the first goal. To achieve that, we 
add transitions into a (new) accepting state from T x F whenever we transition 
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(a) z=2*x-y;, T z=2*y-x+1;,T 


z>y z>x 
T 
z-2*x-y;MT z—2*y-x4-1;MT z>x||z>y, 
{ y) 1 y } '(z>x||z>y),T z»x|[z»y, T '(z>x||z>y),T 
o/u (ur) n (m) (kw) (&w) Ca 
z>y Z>x z>y z>y zx Z>x 
Fig. 6. Correctness witness Wc Fig. 7. Product automaton Acx we 


into a state q and the invariant of q does not hold, and we add self-loops such 
that the automaton stays in the new accepting state forever. In sum, for each 
invariant, there are two transitions, one with the invariant as guard (to assume 
that the invariant holds) and one with the negation of the invariant as guard 
(to assert that the invariant holds, going to an accepting (error) state if it does 
not hold). This transformation ensures that the resulting automaton after the 
transformation is still a proper observer automaton. 


4 Evaluation 


This section describes the results that were obtained in the 9th Competition 
on Software Verification (SV-COMP 2020), in which METAVAL participated as 
validator. We did not perform a separate evaluation because the results of SV- 
COMP are complete, accurate, and reproducible; all data and tools are publicly 
available for inspection and replication studies (see data availability in Sect. 6). 


4.1 Experimental Setup 


Execution Environment. In SV-COMP 2020, the validators were executed in 
a benchmark environment that makes use of a cluster with 168 machines, each 
of them having an Intel Xeon E3-1230 v5 CPU with 8 processing units, 33 GB 
of RAM, and the GNU/Linux operating system Ubuntu 18.04. Each validation 
run was limited to 2 processing units and 7 GB of RAM, in order to allow up to 
4 validation runs to be executed on the same machine at the same time. The time 
limit for a validation run was set to 15 min for correctness witnesses and to 90s 
for violation witnesses. The benchmarking framework BENCHEXEC 2.5.1 was used 
to ensure that the different runs do not influence each other and that the resource 
limits are measured and enforced reliably [15]. The exact information to replicate 
the runs of SV-COMP 2020 can be found in Sect. 3 of the competition report [4]. 


Benchmark Tasks. The verification tasks? of SV-COMP can be partitioned 
wrt. their specification into ReachSafety, MemSafety, NoOverflows, and Termina- 
tion. Validators can be configured using different options for each specification. 


5 https://github.com/sosy-lab/sv-benchmarks/tree/svcomp20 
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'Table 1. Overview of validation for violation witnesses in SV-COMP 2020 


Specification Measure CPACHECKER CPA-wTT FSHELL-WTT METAVAL NITWIT UAUTOMIZER 
ReachSafety executed on 35652 25812 25812 35652 21636 25812 
35 652 uniquely confirmed 3043 42 175 44 398 547 
witnesses jointly confirmed 8019 6010 6740 1566 8055 3802 
Termination executed on 3043 9 720 9 720 
9720 uniquely confirmed 566 9 235 
witnesses jointly confirmed 1539 256 1493 
NoOverflow executed on 3149 3149 3149 3149 3149 
3149 uniquely confirmed 6 1 31 1 89 
witnesses jointly confirmed 1668 1067 1267 1186 1590 
MemSafety executed on 2681 2213 2681 2681 2681 
2681 uniquely confirmed 278 0 21 113 44 
witnesses) jointly confirmed 737 250 364 478 372 


Table 2. Overview of validation for correctness witnesses in SV-COMP 2020 


Specification | Measure CPACHECKER METAVAL UAUTOMIZER 
executed on 66 435 66 435 66 435 

ReachSafet 

( aa ae uniquely confirmed 1750 391 708 
jointly confirmed 17592 13 862 16 834 
executed on 3179 3179 

Dd A uniquely confirmed 44 74 
jointly confirmed 870 870 
executed on 4 426 4 426 

MemSafet 

(4 ds e uniquely confirmed 398 173 
jointly confirmed 811 811 


Validator Configuration. Since our architecture (cf. Fig.1) allows for a 
wide range of verifiers to be used for validation, there are many interesting 
configurations for constructing a validator. Exploring all of these in order to 
find the best configuration, however, would require significant computational 
resources, and also be susceptible to over-fitting. Instead, we chose a heuristic 
based on the results of the competition from the previous year, i.e., SV-COMP 
2019 [3]. The idea is that a verifier which performed well at verifying tasks for a 
specific specification is also a promising candidate to be used in validating results 
for that specification. Therefore the configuration of our validator METAVAL 
uses CPA-SEQ as verifier for tasks with specification ReachSafety, ULTIMATE 
AUTOMIZER for NoOverflow and Termination, and SYMBIOTIC for MemSafety. 


4.2 Results 


The results of the validation phase in SV-COMP 2020 [5] are summarized in 
Table 1 (for violation witnesses) and Table 2 (for correctness witnesses). For each 
specification, METAVAL was able to not only confirm a large number of results 
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that were also validated by other tools, but also to confirm results that were not 
previously validated by any of the other tools. 

For violation witnesses, we can observe that METAVAL confirms significantly 
less witnesses than the other validators. This can be explained partially by 
the restrictive time limit of 90s. Our approach not only adds overhead when 
generating the program from the witness, but this new program can also be 
harder to parse and analyze for the verifier we use in the backend. It is also the 
case that the verifiers that we use in METAVAL are not tuned for such a short 
time limit, as a verifier in the competition will always get the full 15 min. For 
specification ReachSafety, for example, we use CPA-SEQ, which starts with a 
very simply analysis and switches verification strategies after a fixed time that 
happens to be also 90s. So in this case we will never benefit from the more 
sophisticated strategies that CPA-SEQ offers. 

For validation of correctness witnesses, where the time limit is higher, this 
effect is less noticeable such that the number of results confirmed by METAVAL is 
more in line with the numbers achieved by the other validators. For specification 
MemSafety, METAVAL even confirms more correctness witnesses than ULTIMATE 
AUTOMIZER. This indicates that SYMBIOTIC was a good choice in our configuration 
for that specification. SYMBIOTIC generally performs much better in verification 
of MemSafety tasks than ULTIMATE AUTOMIZER, so this result was expected. 

Before the introduction of METAVAL, there was only one validator for correct- 
ness witnesses in the categories NoOverflow and MemSafety, while constructing 
a validator for those categories with our approach did not require any addi- 
tional development effort. 


5 Related Work 


Programs from Proofs. Our approach for generating programs can be seen as a 
variant of the Programs from Proofs (PfP) framework [27,41]. Both generate 
programs from an abstract reachability graph of the original program. The 
difference is that PfP tries to remove all specification violations from the graph, 
while we just encode them into the generated program as violation of the 
standard reachability property. We do this for the original specification and 
the invariants in the witness, which we treat as additional specifications. 


Automata-Based Software Model Checking. Our approach is also similar to that of 
the validator ULTIMATE AUTOMIZER [10]. For violation witnesses, it also constructs 
the product of CFA and witness. For correctness witnesses, it instruments the 
invariants directly into the CFA of the program (see [10], Sect. 4.2) and passes the 
result to its verification engine, while METAVAL constructs the product of CFA 
and witness, and applies a similar instrumentation. In both cases, METAVAL’s 
transformer produces a C program, which can be passed to an independent verifier. 


Reducer-Based Conditional Model Checking. The concept of generating programs 
from an ARG has also been used to successfully construct conditional verifiers [14]. 


6 In the statistics, a witness is only counted as confirmed if the verifier correctly stated 
whether the input program satisfies the respective specification. 
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Our approach for correctness witnesses can be seen as a special case of this 
technique, where METAVAL acts as initial verifier that does not try to reduce the 
search space and instead just instruments the invariants from the correctness 
witness as additional specification into the program. 


Verification Artifacts and Interfacing. The problem that verification results are 
not treated well enough by the developers of verification tools is known [1] and 
there are also other works that address the same problem, for example, the work 
on execution reports [19] or on cooperative verification [17]. 


Test-Case Generation. The idea to generate test cases from verification coun- 
terexamples is more than ten years old [8,39], has since been used to create 
debuggable executables [31,33], and was extended and combined to various 
successful automatic test-case generation approaches [24, 25, 29,35]. 


Execution. Other approaches [18, 22,28] focus on creating tests from concrete and 
tool-specific counterexamples. In contrast, witness validation does not require 
full counterexamples, but works on more flexible, possibly abstract, violation 
witnesses from a wide range of verification tools. 


Debugging and Visualization. Besides executing a test, it is important to un- 
derstand the cause of the error path [23], and there are tools and methods to 
debug and visualize program paths [2,9, 26]. 


6 Conclusion 


We address the problem of constructing a tool for witness validation in a system- 
atic and generic way: We developed the concept of validation via verification, 
which is a two-step approach that first applies a program transformation and 
then applies an off-the-shelf verification tool, without development effort. 

The concept is implemented in the witness validator METAVAL, which has 
already been successfully used in SV-COMP 2020. The validation results are 
impressive: the new validator enriches the competition's validation capabilities by 
164 uniquely confirmed violation results and 834 uniquely confirmed correctness re- 
sults, based on the witnesses provided by the verifiers. This paper does not contain 
an own evaluation, but refers to results from the recent competition in the field. 

'The major benefit of our concept is that it is now possible to configure a 
spectrum of validators with different strengths, based on different verification 
engines. The ‘time to market’ of new verification technology into validators is 
negligibly small because there is no development effort necessary to construct 
new validators from new verifiers. A potential technology bias is also reduced. 


Data Availability Statement. All data from SV-COMP 2020 are publicly 
available: witnesses [7], verification and validation results as well as log files [5], and 
benchmark programs and specifications [6]". The validation statistics in Tables 1 
and 2 are available in the archive [5] and on the SV-COMP website?. Meta Vat 1.0 
is available on GitLab? and in our AEC-approved virtual machine [16]. 


T https: //github.com/sosy-lab/sv-benchmarks/tree/svcomp20 
8 https:/ /sv-comp.sosy-lab.org/2020/results/results-verified/validatorStatistics.html 
? https://gitlab.com/sosy-lab/software/metaval/- /tree/1.0 
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Abstract. SPARK is both a deductive verification tool for the Ada 
language and the subset of Ada on which it operates. In this paper, we 
present a recent extension of the SPARK language and toolset to support 
pointers. This extension is based on an ownership policy inspired by 
Rust to enforce non-aliasing through a move semantics of assignment. 
In particular, we consider pointer-based recursive data structures, and 
discuss how they are supported in SPARK. We explain how iteration 
over these structures can be handled using a restricted form of aliasing 
called local borrowing. To avoid introducing a memory model and to stay 
in the first-order logic background of SPARK, the relation between the 
iterator and the underlying structure is encoded as a predicate which 
is maintained throughout the program control flow. Special first-order 
contracts, called pledges, can be used to describe this relation. Finally, 
we give examples of programs that can be verified using this framework. 


Keywords: Deductive verification - Recursive structures - Ownership 


1 Introduction 


The programming language SPARK [8] has been designed to be amenable to for- 
mal verification, and one of the most impactful design choices was the exclusion 
of aliasing. While this choice vastly simplified the tool design and improved the 
expected proof performance, it also meant that pointers, as a major source of 
aliasing, were excluded from the language. While SPARK over the years had seen 
the addition of many language features, adding pointers just seemed impossible 
without violating the non-aliasing property. Then came Rust [11] democratizing 
a type system based on ownership [5]. Taking inspiration from it, it was possible 
to add pointers to the language in a way that still excludes aliasing. We will give 
an overview of the rules in this paper. 

However, it was unclear if programs traversing recursive data structures such 
as lists and trees could be supported in this setting. In particular, iteration using 
a loop requires an alias between the traversed structure and the iterator. In this 
paper, we detail an approach, inspired by recent work by Astrauskas et al. [1], 
that enables proofs about recursive pointer-based data structures in SPARK. 
We have implemented this approach in the industrial formal verification tool 
SPARK, and, using this tool, developed a number of examples. Some important 
restrictions remain - we will also discuss them in this paper. 
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Ada [2] is a general-purpose procedural programming language. The design 
of the Ada language puts great emphasis on the safety and correctness of the 
program. This objective is realized by using a readable syntax that uses keywords 
instead of symbols where reasonable. The type system is strong and strict and 
many potential violations of type constraints can be detected statically by the 
compiler. If not, a run-time check is inserted into the program, to guarantee the 
detection of incorrect situations. 


declare -- Block introducing new declarations 
type My Int is range -100 .. 100; 
-- User-defined integer type ranging from -100 to 100 
subtype My Nat is My Int range 0 .. My Int'Last; 
-- Subtype of My Int with additional constraints 


X : My Int := 50; -- Static check that 50 is in the bounds of My Int 

Y : My Nat; 
begin -- Part of the block containing statements 

Y 2 X; -- Dynamic check that X is in the bounds of My Nat 
end; -- End of scope of the entities declared in the block 


Ada 2012 introduced contract based programming to Ada. In particular, it is 
possible to attach pre- and postconditions to subprograms!. These conditions 
can be checked during the execution of the program, just like assertions. 

SPARK is the name of a tool that provides formal verification for Ada. It 
uses the user-provided contracts and attempts to prove that the runtime checks 
cannot fail and that postconditions are established by the corresponding subpro- 
grams. As formal verification for the whole Ada language would be intractable, 
SPARK is also the name of the subset of the Ada language that is supported 
by the SPARK tool?. This subset contains almost all features of Ada, though 
sometimes in a restricted form. In particular, expressions should be free from 
side effects, and aliasing is forbidden (no two variables should share the same 
memory location or overlap in memory). This restriction greatly simplifies the 
memory model used in the SPARK tool: any program variables can be reasoned 
about independently from other variables. 

The SPARK tool uses the Why3 platform to generate verification conditions 
for SMT solvers via a weakest-precondition calculus [4]. 


2 Support for Pointers 


Pointers in Ada are called access types. It is possible to declare an access type 
using the access keyword. Objects of an access type are nu11 if no initial values 
are supplied. It is possible to allocate an object on the heap using the keyword 
new. An initial value can be supplied for the allocated object. A dereference of 
a pointer is written as a record component access, but using the keyword all. 


! [n Ada, a distinction is made between functions that return a value, and procedures, 
which do not. Subprogram is the term that designates both. 
? http:/ /docs.adacore.com/spark2014-docs/html/ug/. 
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declare 
type Int_Acc is access Integer; -- Declare a new access type 
X : Int Acc; -- Declare an object of this type 
pragma Assert (X - null); -- No initial values provided, X is null 
Y : Integer; 
begin 
X :- new Integer; -- Allocation of uninitialized data 
X :- new Integer' (3); -- Allocation of initialized data 
Y :- X.all; -- Dereference the access 
end; 


When a pointer is dereferenced, a runtime check is introduced to make sure 
that it is not null. Ada does not mandate garbage collection. Memory allocated 
on the heap can be reclaimed manually by the user using a generic function 
named Unchecked Deallocation, which also sets its argument pointer to 
null. There are several kinds of access types. The basic access types, like Int. Acc 
defined above, are called pool specific access types. They can only designate 
objects allocated on the heap. General access types, introduced by the keyword 
all, can also be used to designate objects allocated on the stack or global data. 

Pointers were excluded from the SPARK subset until recently. Indeed, allow- 
ing pointers in a straightforward way would break the absence of aliasing in 
SPARK. In addition, pointers are associated with a list of classes of bugs such 
as memory leaks, use-after-free and dereferencing a null-pointer. 

To support pointers in SPARK, we designed a subset of Ada’s access types 
which does not introduce aliasing and avoids some pointer-specific issues, while 
retaining as much expressivity as possible. The first restriction we selected is 
the exclusion of general access types. This means that SPARK can only create 
pointers designating memory allocated on the heap, and not on the stack. As 
a result, pointers can only be made invalid by explicit deallocation, and deal- 
location of a valid pointer is always legal. To eliminate aliasing between (heap) 
pointers, ownership rules inspired by Rust have been added on top of Ada's 
legality rules. These rules enforce a single writer/multiple readers policy. They 
ensure that, when a value designated by a pointer is modified, all other objects 
can be considered to be preserved. 

'The basis of the ownership policy of SPARK is the move semantics of assign- 
ments. When a pointer is assigned to a variable, both the source and the target 
of the assignment designate the same memory region: assigning an object con- 
taining a pointer creates an alias. To alleviate this problem, when an object 
containing a pointer is assigned, the memory region designated by the pointer is 
said to be moved. The source of the assignment loses the ownership of the des- 
ignated data while the target of the assignment gains it. The ownership system 
makes sure that the designated data is not accessed again through the source of 
the assignment. 


Y : Int Acc :- X; -- Ownership of the data designated by X is moved to Y 
Y.all :- Y.all + 1; -- The data can be read and modified through Y 
Z i2 X.ali; -- Illegal: Reading or modifying X.all is not allowed 


As the ownership policy ensures that no aliasing can occur between access 
objects, it is possible to reason about the program almost as if the pointer 
was replaced by the data it points to. When an object containing a pointer is 
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assigned to another variable, it is safe to consider that the designated data is 
copied by the assignment. Indeed, any effects that could occur because variables 
are sharing a substructure cannot be observed because of the ownership rules. 

Pointers are handled in the verification model of the SPARK proof tool as 
maybe, or option types: access objects are either null, or they contain a value. 
In addition, access objects also contain an address, which can be used to handle 
comparison (two pointers may not be equal even if the values they designate are 
equal). When a pointer is dereferenced, a verification condition is generated to 
make sure that the pointer is not null, so that its value can be accessed. 


X : Int Acc; -- X is null 

X :- new Integer'(3); -- X has a value which is 3 

Y 2m X; -- Y has a value which is 3 

Z := Y.all; -- Check that Y is not null, Z is 3 


Note that the ownership policy is key for this translation to be correct, as it 
prevents the program from observing side-effects caused by the modification of 
a shared reference, which would not be accounted for in the verification model. 


3 Recursive Data Structures 


In Ada, recursivity can only be introduced through pointers. The idea is to 
first declare a type, but without giving its definition. This declaration, called an 
incomplete declaration, introduces a place-holder for the type, which can only 
be used in restricted circumstances. In particular, this place-holder can be used 
to declare an access type designating pointers to values of this type. Using this 
mechanism, it is possible to declare a recursive data structure, since the access 
type can be used in the type definition as it comes afterward. 


type List_Cell; 
type List is access List_Cell; 
type List_Cell is record 
Data : Integer; 
Next : List; 
end record; 


There are no specific restrictions concerning recursive types in SPARK. However, 
the ownership policy of SPARK implies that it will not be possible to create a 
structure which has either cycles (e.g. doubly linked lists) or shared substructures 
(e.g. DAGs) in it. The ownership policy may also impact how recursive structures 
can be manipulated. In general, working with such structures involves a traversal, 
which can be done either recursively, or iteratively using a loop. Algorithms 
working in a recursive way are generally compliant with the ownership policy of 
SPARK. Indeed, the recursive calls will allow reading or modifying the structure 
in depth without having to deconstruct it?. 


function Length (L : access constant List Cell) return My Nat is 
(if L = null then 0 else Length (L.Next) + 1); 

function Nth (L : access constant List Cell; N : My Pos) return Integer is 
(if N = 1 then L.Data else Nth (L.Next, N - 1)) 

with Pre — N < Length (L); 


3 In Length and Nth, addition on My. Nat and My. Pos has been redefined to saturate 
so as to avoid the overflow checking mandated by Ada. 


182 C. Dross and J. Kanig 


Algorithms involving loops are trickier. The declaration of the iterator used 
for the loop creates an alias of the traversed data structure. As per SPARK’s 
ownership policy, this is considered to be a move, so it makes it illegal to access 
the initial structure. Further assignments to the iterator during the traversal 
contribute to losing definitively one by one the ownership of every node in the 
structure, making it impossible to restore the ownership at the end. 


procedure Set_All_To_Zero (X : in out List) is 


Y : List := X; -- The ownership of X is transferred to Y 
begin 
while Y Æ null loop 
Y.Data := 0; 
Y :- Y.Next; -- Ownership of the first cell of Y is lost for good 
end loop; -- The ownership of X cannot be restored 


end Set A11 To Zero; 


To traverse recursive data structures, a move is not what we want. Here we 
need a way to lend the ownership of a memory region for a period of time and 
automatically restore it at the end. A similar mechanism, called borrowing, is 
available in the Rust language. We have adapted it to SPARK. 


4 Borrowing Ownership 


As Ada is an imperative language, losing the possibility to traverse a linked data 
structure using a loop was deemed too restrictive. To alleviate this problem, a 
notion of ownership borrowing was introduced in SPARK. It allows the users 
to declare a variable, called a borrower, which is initialized with a reference 
to a part of an existing data structure. To state that this initialization should 
not be considered a move, an anonymous access type is used for the borrower‘. 
During the scope of the borrower, the borrowed part of the underlying structure 
is frozen, meaning that it is illegal to read or modify it. Once the borrower has 
gone out of scope, the ownership automatically returns to the borrowed object, 
so that it is again fully accessible. 


X SS) weri -- X is initialized to the list {1,2,3,4} 
declare 

Y : access List_Cell := X; -- Y has an anonymous access type. 

-- Ownership of X is transferred to Y for the duration of its lifetime. 
begin 


Y.Data := Y.Data + 1; -- Y can be used to read or modify X 
pragma Assert (X.Data - 2); -- Illegal, during the lifetime of Y, X 
-- cannot be read or modified directly 
end; 
pragma Assert (X.Data - 2); -- Afterwards, the ownership returns to X 


A borrower can be used to modify the underlying structure. This makes it effec- 
tively an alias of the borrowed object. To allow the tool to statically determine 
the cases of aliasing, SPARK restricts the initial value of a local borrower to be 
the name of a part of an existing object. This forbids for example borrowing one 
of two structures depending on a condition. 


^ A type is said to be anonymous if it does not have a previous declaration. Here 
access List Cell is anonymous while List is named. 
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It is possible to update a borrower to change the part of the object it desig- 
nates (as opposed to modifying the designated object). This is called a reborrow. 
In SPARK, the value assigned to the borrower in a reborrow should be rooted 
at the borrower. This means that reborrows only go deeper into the structure. 


declare 

Y : access List Cell :- X; -- Yis X 
begin 

Y :- Y.Next; -- This is a reborrow, Y is now X.Next 
end; 


Borrowing can be used to allow simple iterative traversals of a recursive data 
structure like the loop of Set. A11. To.Zero. More complex traversals, involving 
stacks for example, cannot be written iteratively in SPARK. 

procedure Set A11 To Zero (X : in out List) is 


Y : access List Cell :- X; 
-- The ownership of X is transferred to Y for the duration of its lifetime 


begin 
while Y # null loop 
Y.Data := 0; 
Y :- Y.Next; -- Reborrow: Y designates something deeper 
end loop; 
end Set All To Zero; -- The ownership of X is restored 


Using reborrows, local borrowers allow one to indirectly modify a data structure 
at an arbitrarily-deep position, which may not be statically-known. While in the 
scope of the borrower, these indirect modifications can be ignored by the analysis, 
as the ownership policy makes them impossible to observe. However, after the 
end of the borrow, ownership is transferred back to the borrowed object, and 
SPARK needs to take into account whatever modifications may have occurred 
through the borrower. 


X := ...; -- X is initialized to the list {1,2,3,4} 
declare 
Y : access List Cell :- X; -- Yis X 
begin 
Y :- Y.Next.Next; 
-- Through reborrows, Y designates an arbitrarily-deep part of X 
Y.Data :- 42; -- Y is used to indirectly modify X 
end; 
pragma Assert (X.Next.Next.Data - 42); -- The assertion should hold 


'To be able to reconstruct the borrowed object from the value of the borrower, 
we must track the relation between them. As this relation cannot be statically 
determined because of reborrows, SPARK handles it as an additional object 
in the program. This allows us to take advantage of the normal mechanism 
for handling value dependent control-flow in SPARK (the weakest-precondition 
calculus of Why3). The idea is the following. When a borrower is declared in Ada, 
we create two objects: the borrower itself, which is considered as a stand-alone 
structure, independent of the borrowed object, and a predicate. The predicate, 
which we call the borrow relation, encodes the most precise relation between the 
borrower and the borrowed object which does not depend on the actual value 
designated by the borrower. The value of the borrow relation is computed by 
the tool from the definition of the borrower, and is updated at each reborrow. 
Modifications of the underlying data structure don't impact this relation. At the 
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end of the borrow, the borrowed object is reconstructed using both the borrow 
relation and the current value of the borrower. 


X := ...; -- X is initialized to the list {1,2,3,4} 

declare 
Y : access List Cell := X; -- Create borrow relation to relate X and Y 
-- b rel :- A new x, new y. new x # null ^ new x = new y 

begin 
Y :- Y.Next.Next; -- Update the predicate to model the new relation 
-- b rel :- A new x, new y. new x # null ^ new x.data = 1 ^ 
-- new x.next # null ^ new x.next.data = 2 ^ new x.next.next # null 
== A new_x.next.next = new_y 
Y.Data := 42; -- The borrow relation is not modified 

end; 

pragma Assert (X.Next.Next.Data = 42); 

-- Follows from the fact that X.Next.Next = Y and Y.Data = 42 


5 Describing the Borrow Relation 


SPARK performs deductive verification, which relies on user-specified invariants 
to handle loops. When traversing a linked data structure, the loop body contains 
a reborrow, which means that the borrow relation is modified in the loop. As 
a general rule, if a variable is modified in a loop, it should be described in the 
loop invariant, lest nothing is known about its value afterward. Thus, we need 
a way to describe the borrow relation in the loop invariant. 

As part of their work on the Prusti proof tool for Rust, Astrauskas et al. found 
the need for a similar annotation that they call pledges [1]. In Rust, a pledge is 
an assertion associated with a borrower which is guaranteed to hold at the time 
when the borrow expires, no matter what may happen in between. In SPARK, a 
property guaranteed to hold at the end of the borrow must be a consequence of 
the borrow relation, since the borrow relation is the most precise relation which 
does not depend on the actual value of the borrower. Therefore, the user-visible 
notion of a pledge is suitable to approximate the internally computed borrow 
relation. Similar to user-provided postconditions, which must be implied by the 
strongest postcondition computed by a verifying tool, the user-provided pledge 
should follow from the borrow relation. 

Since the Ada syntax has no support for pledges, we have resorted in SPARK 
to introducing special functions (dedicated to each access type) called pledge 
functions, which mark expressions which should be considered as pledge expres- 
sions by the tool. A pledge function is a ghost function (meaning that it is not 
allowed to have any effect on the output of the program) which has two param- 
eters. The first one is used to identify the borrower on which the pledge should 
apply, while the second holds the assertion. Note that a call to a pledge func- 
tion isn’t really a call for the SPARK analyzer. It is simply a marker that the 
expression in argument is a pledge. 


function Pledge 


(L : access constant Cell; -- The borrower to which the pledge applies 
P : Boolean) -- The property we want to assert in the pledge 
return Boolean 
is (P) -- For execution, the function evaluates the property 


with Ghost, 
Annotate => (GNATprove, Pledge); -- Identifies a pledge function for SPARK 
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When a pledge function is called in an assertion, SPARK recognizes it and iden- 
tifles its parameter as a pledge. It therefore attempts to show that the property 
is implied by the borrow relation (as opposed to implied by the current value of 
the borrower). 


X := ...; -- X is initialized to the list {1,2,3,4} 
declare 

Y : access List Cell :- X; 
begin 

Y :- Y.Next.Next; 


pragma Assert (Pledge (Y, Y - X.Next.Next)); 

-- True as this is implied by borrow relation 

pragma Assert (Pledge (Y, X.Data - 1 and X.Next.Data - 2)); 

-- True again as the first 2 elements of X are frozen 

pragma Assert (Pledge (Y, X.Next.Next.Data - 3)); 

-- False, though this is true at the current program point, as it is not 
-- guaranteed to hold at the end of the borrow. 


end; 


Using pledges, we can formally verify the Set A11 To Zero procedure. Its post- 
condition states that all elements of the list have been set to 0 using the Nth 
function. To be able to express the loop invariant in a similar way, we have intro- 
duced a ghost variable C to count the number of iterations. Its value is main- 
tained by the first loop invariant. The second and third invariants are pledges, 
describing how the value of X can be reconstructed from the value of the iterator 
Y. The second invariant gives the length of the list, while the third describes 
the value of its elements using the Nth function. Elements which have already 
been processed are frozen by the borrow. Their value is known to be 0. Other 
elements can be linked to the corresponding position in the iterator Y. 


procedure Set A11 To Zero (X : List) with 
Pre => Length (X) < My Nat'Last, 
Post — Length (X) - Length (X)'Old 
and (for all I in 1 .. Length (X) — Nth (X, I) = 0); 
-- All elements of X are 0 after the call 


procedure Set A11 To Zero (X : List) is 


C : My Nat :- 0 with Ghost; 
Y : access List Cell :- X; 
begin 


while Y # null loop 
pragma Loop Invariant (C - Length (Y)'Loop Entry - Length (Y)); 
-- C elements have been traversed 
pragma Loop Invariant 


(Pledge (Y, Length (X) = Length (Y) + C)); 
pragma Loop Invariant 

(Pledge (Y, (for all I in 1 .. Length (X) — 

Nth (X, I) = (if I X C then 0 else Nth (Y, I - C))))); 
-- All elements are 0 up to C, others are elements of Y 
Y.Data := 0; 
Y :- Y.Next; 
C :=C +1; 
end loop; 


end Set All To Zero; 


Note that, in general, it is not necessary to write a pledge to verify a program 
using a local borrower. Indeed, the analysis tool is able to precisely track the bor- 
row relation through successive reborrows. Pledges need only be provided when 
the borrow relation itself cannot be tracked by the tool, for example because of 
a loop, like in our example. 
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6 Evaluation 


We could not try the tool on any pre-existing benchmark since SPARK codebases 
do not have pointers, and Ada codebases usually violate some SPARK rules. In 
particular, Ada codebases have no reason to abide by the ownership policy of 
SPARK. So instead, we mostly had to write new tests to assess the correctness 
and performance of our implementation. The public testsuite of SPARK contains 
more than 150 tests mentioning access types, be they supported cases or not. 

To assess expressivity and provability on programs dealing with recursive 
data structures, we have written 6 examples, none of them very big, but ranging 
over various levels of complexity?. On all of these examples, we have shown that 
the runtime checks imposed by the Ada language are guaranteed to pass and 
that no uninitialized value can be read. In addition, we have manually supplied 
functional properties. 

Figurel gives some metrics over these examples. Under the tab Loc are 
listed the total number of lines of code in the example, the number of lines of 
specification (including contracts and specification functions), and the number of 
additional ghost annotations (assertions, loop invariants, ghost variables. ..). The 
#Checks column gives the number of checks generated by the tool (contracts, 
assertions, invariants, language defined checks...). In the last three columns, we 
can see the total running time of SPARK, both from scratch using its default 
strategy and only replaying the proofs through the replay facility, as well as the 
maximal time needed to prove a single verification condition. 


Example #Subp pos zz Checks Heels 
All Spec Ghost Default Replay Max VC 
set all to zero 5 57 19 (33%) 8 (14%) 25 4 3 «1 
linear search 7 136 67 (49%) 24 (17%) 109 10 9 «1 
pointer-based maps 7 130 38 (29%) 12 (9%) 64 6 5 <1 
route shift 8 99 50 (50%) 3 (3%) 64 9 6 l 
binary search 13 239 99 (41%) 42 (17%) 129 24 17 4 
red black trees 37 611 107 (17%) 384 (63%) 920 258 152 16 


Fig. 1. Overview of the examples involving recursive data structures 


Though these examples are small, we think they demonstrate that it is pos- 
sible to define recursive data structures in SPARK, and to verify iterative pro- 
grams using them. When writing the algorithms, we found that the limitations 
mostly come from the ownership policy of SPARK. Some data structures are not 
supported, requiring either to switch to full Ada for their implementations, or 
to change the algorithm to work around the missing links. In general, we found 


5 https: //github.com/AdaCore/spark2014/tree/master/papers/Pledge2020/ 
examples. 
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that the annotation effort required to describe the borrow relations, though non- 
negligible, was acceptable. In particular, it uses the standard SPARK expres- 
sions, with no mentions of memory separation or permission. 


7 Related Work 


Program verification tools for mainstream languages such as C or Java generally 
support aliasing, because the concept of pointer or reference is more central. They 
deal with it by modeling the heap. The WP plugin of Frama-C uses by default a 
typed memory model where different arrays are used for the basic types of C [6]. 
The VerCors [3] toolset handles high-level programming languages, such as Java, 
by extending the annotation language with separation logic with permission [10]. 
In SPARK we have chosen a different approach, as we avoid modeling the heap 
completely by using ownership rules to enforce non-aliasing. 

The ownership rules introduced in SPARK are largely inspired by the Rust 
language [11]. The differences are mostly motivated by the need to comply with 
the preexisting Ada semantics of pointers. In addition, SPARK was aiming at 
coming up with a subset as easy to verify as possible. The resulting model 
is simpler because it does not make lifetime of borrowers explicit, and aliases 
created through borrows are always statically known. 

The Prusti verification tool for Rust [1] allows users to verify that a program 
complies with its specification. Both tools provide similar guarantees and require 
similar annotations. However, they differ in their implementation. Indeed, Prusti 
works by translating separation constraints enforced by the Rust type system 
to the intermediate verification language of the Viper tool [9]. Our work differs 
here, as we use the ownership system to abstract away memory related concerns, 
so that the verification process does not need to be aware of them. 

In a recent work [7], Matsushita et al. propose a translation to CHCs for 
Rust programs. Like in our approach, the restrictions imposed by the ownership 
policy are key for the soundness of their method. However, while we introduce 
the notion of borrow relation to be able to use a standard WP calculus, they 
present a new calculus specifically tailored to Rust references. 


8 Conclusion 


We have presented a recent extension of the SPARK language and toolset to 
support pointers. It is based on an ownership policy enforcing non-aliasing. To 
support pointer-based recursive data structures, a restricted form of aliasing is 
introduced in SPARK through local borrowers, which can be used to iterate 
through a linked data structure in an imperative way. We have described how 
local borrowers can be supported by the verification tool, without introducing 
a memory model, by using a mutable predicate named the borrow relation. 
This borrow relation can be described when necessary using special annotations 
named pledges, which solely consist of SPARK standard expressions, and do not 
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expose the underlying verification technique. Our work is available in the 20.1 
release of SPARK Pro and will be part of the next community release. 

As for future work, we would like to extend the subset of Ada pointers sup- 
ported in SPARK. In particular, we would like to introduce function pointers to 
model callbacks, pointers to constants with a more permissive ownership policy, 
and local borrowing of objects allocated on the stack. 
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Abstract. Ivy is a multi-modal verification tool for correct design 
and implementation of distributed protocols and algorithms, supporting 
modular specification, implementation and proof. Ivy supports proving 
safety and liveness properties of parameterized and infinite-state systems 
via three modes: deductive verification using an SMT solver, abstraction 
and model checking, and manual proofs using natural deduction. It sup- 
ports light-weight formal methods via compositional specification-based 
testing and bounded model checking. Ivy can extract executable dis- 
tributed programs by translation to efficient C++ code. It is designed to 
support decidable automated reasoning, to improve proof stability and 
to provide transparency in the case of proof failures. For this purpose, 
it presents concrete finite counterexamples, automatically audits proofs 
for decidability of verification conditions, and provides modular hiding 
of theories. 


1 Introduction 


Ivy is an open-source [16] multi-modal verification tool for correct design and 
implementation of distributed algorithms, supporting modular specification, 
implementation and proof. The motivating principles of Ivy are predictability, 
stability and transparency. That is, automated proof steps should provide com- 
plexity bounds, should be insensitive to small perturbations, and when they fail 
should provide actionable feedback. To the extent consistent with these princi- 
ples, Ivy aims to maximize expressiveness and proof automation, and thus to 
achieve a high level of user productivity in designing, implementing and prov- 
ing programs. A major goal of Ivy is to support decidable reasoning. That is, 
automated proof should be restricted to logical fragments for which the tool is 
a decision procedure. This greatly improves the stability of automated provers, 
which otherwise rely on fragile heuristics to avoid divergence [28]. This is impor- 
tant for the maintenance of large proofs, to prevent small changes from creat- 
ing unpredictable proof failures. Moreover, on decidable problems, provers fail 
transparently by providing true counterexamples, which greatly simplifies the 
iterative development of proofs. Ivy supports the decomposition of proofs to 
decidable theories by the use of modular abstraction. 
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The architecture of Ivy is depicted in Fig. 1. The figure shows the major 
components of the tool and the information flow between them. Ivy provides a 
language (also called “Ivy”) for the modular description of distributed programs, 
along with their specifications and proofs (see Sect.2). Ivy is a synchronous, 
reactive programming language [3], meaning that the program only executes 
actions in response to input from its environment, and these actions appear to 
execute atomically. From an Ivy program, the tool can extract an asynchronous, 
distributed implementation. A program is made up of reactive modules [1], each 
having a temporal assume/guarantee-style specification. After parsing of this 
description and elaboration of templates, the program is decomposed into its 
component modules, each with associated assumptions and proof obligations, 
according to a system of proof rules for circular assume/guarantee reasoning 
(see Sect. 2.1). 

These proof obligations are passed on to the tactics engine (see Sect. 3). This 
engine orchestrates the use of various built-in proof tactics, including decidable 
invariant checking with an SMT solver (Sect. 3.1), model checking with eager 
abstraction [19] (Sect. 3.2), liveness proof by translation to safety (Sect. 3.3) and 
logical deduction rules (Sect. 3.4). Each tactic works by reducing a given proof 
goal to a (possibly empty) set of sub-goals, from which the original goal can be 
proved. Combined with modular reasoning, the tactics engine makes it possible 
to use a variety of proof approaches and proof automation tools in constructing 
a proof. 

Ivy extracts executable distributed programs by translation to C++ (see 
Sect.5). From the specifications of a module, Ivy can also generate a modular 
randomized specification-based tester [7] (see Sect. 4.1). This also makes it pos- 
sible to test infrastructure not written in Ivy (including hardware) against Ivy 
specifications. 


1.1 Related Work 


Ivy can be thought of as a hybrid between program verification tools such as 
ESC-Java [11] and Dafny [14], based on the Floyd/Hoare approach, composi- 
tional model checking tools, such as Mocha [2] and Cadence SMV [17] and proof 
assistants based on the LCF model, such as Isabelle [26] or Coq [4]. Compared to 
program verification tools that support only procedure modularity, Ivy provides 
a richer form of specification that allows complete hiding of internal state, and 
provides architectural support for decidable reasoning (see Sect. 2.1). Compared 
to compositional tools, Ivy integrates a richer variety of reasoning techniques 
(see Sect. 3). Compared to proof assistants, Ivy provides domain-specific support 
for decidable proof automation, supporting a greater degree of proof automa- 
tion [28]. On the other hand, Ivy relies on a vastly larger trusted computing base 
than typical proof assistants. Moreover, Ivy has no mechanism of reflection, and 
thus cannot be used for meta-reasoning about programs and program transfor- 
mations. In principle, all the techniques in Ivy could be integrated into a tool such 
as Isabelle or Coq but the effort would be large. A less foundational tool such as 
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Fig. 1. Ivy architecture, showing flow between major components. Red, solid arrows 
represent flow of proof goals and assumptions. Green, dashed arrows represent flow of 
proofs and/or counterexamples. Not shown is VC generator, shared between Invariant 
Checking/BMC and Eager Abstraction components. (Color figure online) 


Ivy makes it possible to rapidly experiment with new proof and proof automa- 
tion strategies. Compared to all of these tools, Ivy differs in providing native 
support for extracting distributed programs, and specification-based testing. A 
related tool, mypyvy, focuses on more powerful invariant inference techniques, 
but lacks the other features of Ivy [10,29]. 


2 A Modular Language for Decidable Reasoning 


'The primary design goal of Ivy's language is to support decidable reasoning while 
maximizing expressiveness and performance. Figure 2 is an example of the basic 
unit of verification in Ivy, called an isolate. An isolate is a reactive module that 
hides internal state and provides a temporal (that is, stateful) specification of its 
interface. Àn isolate has named traits that include types, properties, variables 
and actions. It is divided into a specification part and an implementation part. 
The figure shows an example of a simple module that inputs a sequence of 
numbers and outputs an upper bound on the numbers received thus far. 


Types, Variables and Actions. The native datatypes in Ivy include just the 
Boolean type, uninterpreted types, records (structs) over datatypes, and pure 
first-order functions. In the figure, line 2 declares an uninterpreted type t. Line 
6 declares a state variable ‘seen’ holding a predicate over t. This variable is 
initialized at line 9. This assigns ‘seen(X)’ to be the function that returns false 
for all values of X. 
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Procedures in Ivy are called actions and may have side effects on variables. 
Parameters are passed by value and there are no references. This greatly sim- 
plifies modular reasoning (see Sect.2.1) and also allows for aggressive compiler 
optimizations due to the absence of aliasing (see Sect. 5). 

In the figure, line 3 declares an action ‘ub’ that takes an input x of type t 
and outputs y of type t. Its implementation is given at lines 24 to 27. It updates 
a state variable ‘max’ holding the maximum value received thus far, and returns 
this value by assigning it to the output variable y. 


2.1 Modularity and Decidability 


The specification part of the isolate (lines 5 to 18) consists of ghost variables and 
code that are visible outside the isolate. The implementation part (lines 19 to 30) 
consists of real variables and code that are invisible outside the module. At line 
15 the ghost predicate ‘seen’ is updated to reflect the fact that value x has been 
seen as an input. Specification code contains assume/guarantee specifications in 
terms of require and ensure statements. For example, line 12 represents an 
assumption that input values are non-negative. Line 16 represents a guarantee 
that output values will be an upper bound on all seen values. 

Ghost and real code are kept syntactically separate in Ivy. The specification 
code is interleaved with the implementation code using the directives ‘before’ 
(line 11) and ‘after’ (line 14). Thus, in the figure, the ‘require’ statement acts 
as a precondition, while the ‘ensure’ statement acts as a postcondition. The 
implementation code is not allowed to side effect any externally visible state, so 
it is sound to erase (or ‘slice’) this code when verifying other modules. Other 
modules see only the ghost code, which provides an abstract model of the isolate. 
Similarly, when extracting executable code, it is safe to erase the ghost code 
(which must be proven to be terminating). This makes it possible, for example, 
to provide a pure, functional specification of a module interface, even though 
internally it has state. 

Theories can also be hidden inside modules. For example, the implementation 
of our example interprets the type t as the integers (line 28). For verification 
purposes, this instantiates the theory of Peano arithmetic for type t. This theory 
is used only to prove correctness of the isolate, and is invisible to other isolates. 
The theory can be used to prove properties (such as the irreflexivity property 
at line 7) that provide an abstraction of the type externally. The ability to hide 
theories behind abstractions provides an important strategy for keeping proof 
obligations decidable. 

An isolate with no implementation part (that is, a “ghost” module) can act 
as an abstract model of a protocol. Using Ivy's modular rules, an abstract model 
can be refined to an implementation, using properties of the abstract model as 
lemmas. In addition to simplifying the proof, abstract models provide another 
useful strategy to hide functions, properties or theories that break decidability. 
'This approach, in combination with theory hiding, was used to verify implemen- 
tations of distributed consensus protocols [28]. Modularity provides the primary 
means in Ivy of keeping the automated reasoning decidable. 
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1 isolate foo = { 

2 type t 

3 action ub(x:t) returns (y:t) ið implementation { 

- $ : 20 var max : t 

5 specification ( 21 after init { 

6 relation seen(X:t) 25 nasisi: 

7 property VX : t.n(X < X) 2 DIN 

E ae i= false: 24 implement ub { 

30 m , 25 max := x if x > max else max; 
11 before ub ( z } vc 

E require x 70; 28 interpret t — int 

` 29 invariant seen(X) — X < max 
14 after ub { 30 } = 

15 seen(x) := true; 31 } 

16 ensure seen(X) — X € y; 

17 H 

18 H 


Fig. 2. Example of an Ivy isolate. 


3 Verification Tactics 


Ivy provides a range of automated tactics for discharging proof goals that are 
selected for their relatively predictable and stable performance, and for the abil- 
ity to fail transparently. 


3.1 Invariant Checking with SMT 


'The default tactic for proving safety properties is proof by inductive invariant, 
using the SMT solver Z3 [21]. For example, in Fig. 2, the guarantee at line 16 is 
proved using the auxiliary inductive invariant at line 29. The invariant relates the 
hidden implementation state variable *max' with the visible specification state 
variable ‘seen’. An invariant is a property that is required to hold only between 
executions of actions of the isolate. That is, actions may temporarily violate an 
invariant, but must re-establish it before terminating. The VC (verification con- 
dition) for the isolate holds if all invariants are established by the intializers and 
preserved by the interface actions, and if the invariant implies that no assertion 
in the code fails. These conditions are verified modulo the visible theories. 

Before attempting to prove the VC, the invariance tactic sends it to the 
fragment checker, which determines whether the VC is in a logical fragment 
called FAU [12] for which Z3 is a decision procedure. If the VC is not in FAU, 
Ivy provides an explanation to the user, by pointing to formulas that create a 
function cycle or that violate rules for the use of quantifiers and interpreted 
operators of the visible theories. A function cycle is a cycle in a graph whose 
vertices are types and whose edges are functions (including Skolem functions). 
'This transparent mode of failure helps the user to reorganize the proof to keep 
the VC’s in the decidable fragment. 

If a VC in the decidable fragment is false, Z3 fails transparently, producing 
a true finite counter-model, which is in turn translated into an execution trace 
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that violates an invariant or guarantee. Ivy provides a graphical interactive tool 
to help the user in strengthening invariants [25] based on counterexamples. If 
the VC is valid, the tactic discharges the proof goal, returning the empty set of 
subgoals. 


3.2 Eager Abstraction and Model Checking 


An alternative tactic to prove safety properties is model checking with eager 
abstraction [19]. This technique allows parameterized and infinite-state systems 
to be verified with a finite-state model checker. The tactic first propositionally 
strengthens the symbolic transition relation by adding instances of axioms of 
the logic and theories, or of proved properties. It then propositionally abstracts 
the transition relation by converting the atomic predicates to Boolean variables. 
The resulting finite-state abstraction is verified by the ABC model checker [8]. 
If the property is false, the user is presented with an abstract counterexample 
expressed in terms of the truth values of the atomic propositions. The user 
may refine the abstraction by adding instantiation terms or auxiliary invariants. 
In [19] it was shown that this technique can reduce the burden of constructing 
auxiliary invariants, simplifying the overall proof of distributed protocols. As 
an example, the isolate of Fig. 2 can be proved without the auxiliary invariant. 
With eager abstraction, one need not be concerned with function cycles, but on 
the other hand, diagnosing abstract counterexamples can be challenging. 

'This approach is consistent with Ivy's philosophy of using stable and trans- 
parent automation, since the finite-state model checker has a single-exponential 
upper complexity bound and terminates with a proof or a counterexample. This 
is in contrast to more powerful proof engines such as Horn solvers [6] that suf- 
fer from unpredictable divergence. In practice, although eager abstraction is not 
fully automated, it can handle problems that are substantially beyond the capa- 
bilities of current Horn solvers. 


3.3 Liveness-to-Safety Transformation 


Ivy supports proofs of temporal properties, e.g., liveness properties, via a 
liveness-to-safety transformation. Temporal properties are specified in first-order 
linear temporal logic (FO-LTL). The liveness-to-safety tactic reduces a temporal 
proof goal into a safety proof goal, which can then be proven using an induc- 
tive invariant. For finite-state or parameterized systems, any temporal prop- 
erty can be proven by showing the absence of fair cycles, which is a safety 
property [27]. For infinite-state systems such an argument is not sound, and 
Ivy implements dynamic abstraction which generalizes the notion of fair cycles 
to infinite-state systems in a sound and powerful way [23,24]. With dynamic 
abstraction, Ivy’s liveness-to-safety tactic supports temporal proofs of infinite- 
state systems, including both distributed systems with infinite-state per process 
and systems with unbounded parallelism, where new processes can be dynami- 
cally created so an infinite trace may involve infinite set of processes. 
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1 isolate bar = { 

2 finite type t 

3 action step(x:t) 16 temporal property (LJ enter.now) — 

4 specification { 17 © VX. apending(X) 

5 relation pending(X:t) 18 proof { 

6 instance enter : signal 19 tactic 12s with 

7 20 invariant © enter.now 

8 after init { 21 invariant ($was$ —^pending(X)) — —pending( X) 
‘ pending(X) := true; 22 invariant ($happened$ enter.now) — 
10 } 23 3X. ($was$ pending(X)) ^ ^pending(X) 
11 before step 1 24 } 
12 require pending(x); 25 } 
13 call enter.raise; 26 } 
14 pending(x) := false; 


Fig. 3. Example of an Ivy isolate with a temporal property. 


The liveness-to-safety tactic fits within Ivy’s philosophy of using decidable 
reasoning. The more standard way of proving liveness properties is to use rank- 
ing functions, but for distributed systems, the required rankings often involve 
cardinalities of sets defined via first-order formulas, resulting in verification con- 
ditions that fall outside FAU and other decidable fragments. In contrast, the 
transformation to safety based on fair cycles and dynamic abstraction results in 
verification conditions which are often in the FAU fragment. Furthermore, since 
the temporal proof is transformed to a safety verification problem, it is possible 
to leverage for liveness proofs all the tactics and mechanisms that Ivy contains 
for safety verification. 

When the liveness-to-safety tactic is applied, Ivy constructs a symbolic cycle 
detection transition system, which tracks fairness constraints and includes a 
shadow or saved copy of the state variables, similar to [5]. For finite-state or 
parameterized systems, it is enough to show that it is not possible to revisit the 
saved state while satisfying all fairness constraints. This can be shown by an 
inductive invariant, and Ivy contains special syntax for writing the invariant of 
the cycle detection system (e.g., to access the saved copy of state variables). For 
infinite-state systems, Ivy’s cycle detection system includes dynamic abstraction, 
and invariants may also refer to the state of the abstraction [23]. 

Figure 3 shows an example of a simple liveness proof of an abstract model in 
Ivy. The type t (line 2) is declared as finite, which means it is sound to use a 
fair cycle argument without dynamic abstraction. The specification state of the 
system consists of a single unary relation, pending, which is initialized to true 
for all values of type t. The step action (line 11) removes a single value from the 
pending relation. This can model, e.g., execution of tasks from a finite pool of 
pending tasks. The temporal property that we prove (line 16) is that if step is 
called infinitely often, then eventually nothing is pending. At line 13, we detect 
the call by raising a flag enter.now. The proof applies the liveness-to-safety (12s) 
tactic (line 19), and supplies inductive invariants for the cycle detection system. 
The special operators $was$ and $happened$ are used to refer to the saved state, 
and the fairness constraints, respectively. The crux of the invariant is that after 
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1 axiom eid(X) = eid(Y) —^ X = Y 

2 axiom mgr(X,Y)Amgr(X,Z)— Y=Z 

3 explicit axiom [mgr_total] JY. mgr(X, Y) 
4 axiom mgr(X, X) — X — ceo 

5 

6 invariant mgr(X, Y) ^ scanned(Y) — mid(X) = eid(Y) 
7 

8 action get. mid(x:emp) returns (res:id) = { 
9 require VY.scanned(Y); 

10 res :— mid(x); 

11 ensure x Æ ceo — res Æ eid(x); 

12 proof ( 

13 assume mgr, total with X — x 

14 

15 ) 


Fig. 4. Example of manual quantifier instantiation with a tactic 


enter.now has happened, there is some element which was pending in the saved 
state and is not pending anymore, showing that the system has no fair cycle. 


3.4 Logical Tactics 


Though most of a proof in Ivy is done with the above automated proof tactics, 
there are occasional situations in which a small amount of detailed manually- 
guided proof is needed, or is preferable to restructuring the proof. For this 
purpose, Ivy provides logical proof tactics that can be applied to properties, 
invariants or code assertions, either to complete the proof or to reduce it to 
subgoals that can be discharged by the automated tactics. A simple example 
is shown in Fig. 4. Here, mgr(X, Y) indicates that the manager of employee X 
is Y and eid(X) is the employee id of X. We assume that employee ids are 
unique, each employee has exactly one manager and that only the CEO is her 
own manager (lines 1 to 4). Action get mid(z) returns the id of the manager of 
employee x. For this purpose, a procedure (not shown) scans the employees m 
and sets mid(z) = eid(m) for each x managed by m, establishing the invariant 
at line 6. Action get_mid(a) requires that all employees have been scanned and 
ensures that the return value is not the id of x, unless x is the CEO. 

Axiom mgr total states that for all employees there exists a manager (the 
universal quantifier on X is implicit). Ivy complains that this quantifier alter- 
nation puts the VC outside the decidable fragment. We can solve this with a 
manual quantifier instantiation. We first tag the axiom explicit, meaning that it 
is not used by the default tactic. We then apply the tactic ‘assume’ (line 13) to 
instantiate this axiom for X = x. The resulting assumption 3Y.mgr(z, Y ) has no 
alternation. The modified proof goal is discharged by the default tactic using Z3. 
Ivy’s proof engine is based on the AJ calculus [13] and a deterministic second- 
order matching algorithm [30]. The Ivy standard library uses this framework to 
define proof rules for natural deduction, similarly to Isabelle/FOL [26]. Logical 
tactics also make it possible to perform theory reasoning outside the decidable 
fragment, for example, applying the Peano induction axiom. 
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4 Light-Weight Formal Methods 


4.1 Compositional Specification-Based Testing 


Before attempting a formal proof that an isolate satisfies its specification, it is 
useful to debug it using testing. For this purpose, Ivy provides compositional 
specification-based testing. The testers that Ivy produces generate randomized 
input sequences for an isolate that satisfy its assumptions and check the outputs 
against the isolate’s guarantees. This is similar in principle to specification-based 
testing tools such as QuickCheck [9], but is reactive and compositional. Composi- 
tionality provides a kind of completeness for unit testing. That is, if a system fails 
its specification, then there is a local test of some component that fails. Unlike 
QuickCheck, Ivy does not require the user to provide generators for datatypes, 
instead relying on SMT solving for this purpose. Ivy can also be used to gener- 
ate specification-based tests for hardware or software systems not written in Ivy. 
For example, it has been used to find bugs in memory hierarchy components for 
RISC-V processors [18], and the QUIC secure Internet transport protocol [20]. 


4.2 Bounded and Finite-State Model Checking 


For debugging, Ivy supports bounded model checking. This is decidable if the 
VC’s are in the decidable fragment. It also allows uninterpreted types to be 
finitely instantiated, allowing under-approximate model checking in the style of 
TLC [31]. 


5 Extracting Efficient Executable Code 


Compilation. The implementation part of an Ivy program can be extracted as 
executable code in C++. To be extractable, the implementation must satisfy cer- 
tain computability conditions, for example that all quantifiers in conditionals be 
bounded. For functions, the compiler can choose among several representations: 
a closure, a dense representation as an array, or a sparse representation as a hash 
table. The dense representation is unboxed, allowing a cache-efficient contiguous 
representation of an array of structures and reducing allocation overhead. 
Because there are no references in Ivy, there is a risk of copying large struc- 
tures passed as arguments. However, the lack of aliasing makes it relatively easy 
for the compiler to detect linear use of data, allowing call and return by reference 
in the extracted code, and in-place update of structures. Subtype polymorphism 
in Ivy is implemented by the compiler using smart pointers, allowing structure 
sharing (and potentially copy-on-write, though this is not yet implemented). 
In addition, the compiler borrows a technique from the Rust language [22] to 
introduce references. Consider the Ivy code on the left of Fig.5 that looks up a 
value in a map, operates on it, then writes it back into the map. The compiler 
recognizes this as an instance of the “borrowing” pattern and renders it as the 
C++ code on the right, which operates on the value in the map by reference. 
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x); 1 auto &b = mfx]; 
) 2 f(b); 


Fig. 5. Updating a map in place using the borrow pattern. 


This is possible because the of lack of aliasing and the fact that the compiler 
understands the underlying data structures. A C++ compiler cannot accomplish 
this optimization because of the difficulty of pointer analysis in the map imple- 
mentation and the called operator f. Benchmarks of an older Ivy compiler [28] 
on distributed protocols showed comparable performance to implementation in 
OCaml and Go, though Ivy is purely value-based, while these languages support 
references. 


Concurrency. Although Ivy is a synchronous reactive language, the compiler can 
extract parameterized distributed programs from Ivy programs in a sound way. 
In a parameterized module, each action and state variable has a first parameter 
representing a location. The compiler verifies that different locations do not 
interfere with each-other, and then extracts an executable process that takes its 
location as a parameter. Ivy guarantees that executing the locations concurrently 
is observably equivalent sequential execution, based on a left-mover /right-mover 
argument [15,28]. 


Run-Time Support. Ivy provide a standard library that includes useful abstrac- 
tions, such ordered datatypes and arrays, as well as formally specified interfaces 
to networking services provided by operating systems. In addition, the com- 
piler automatically generates marshaling and unmarshaling code for user-defined 
datatypes. These facilities make it relatively straightforward to implement veri- 
fied networked protocols in Ivy. 


6 Conclusion 


Ivy has been designed to provide predictability, stability and transparency in 
the process of developing verified systems. For this purpose, it integrates a col- 
lection of verification techniques that provide these properties, while attempting 
to maximize the expressiveness of the language, the degree of proof automation, 
and the efficiency of extracted code. By setting the division of labor between the 
human and automated provers appropriately, it aims to increase the productivity 
of the overall process of formal development. 
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Abstract. We propose an extension of separation logic with fractional 
permissions, aimed at reasoning about concurrent programs that share 
arbitrary regions or data structures in memory. In existing formalisms, 
such reasoning typically either fails or is subject to stringent side condi- 
tions on formulas (notably precision) that significantly impair automa- 
tion. We suggest two formal syntactic additions that collectively remove 
the need for such side conditions: first, the use of both “weak” and 
"strong" forms of separating conjunction, and second, the use of nominal 
labels from hybrid logic. We contend that our suggested alterations bring 
formal reasoning with fractional permissions in separation logic consid- 
erably closer to common pen-and-paper intuition, while imposing only a 
modest bureaucratic overhead. 


Keywords: Separation logic - Permissions - Concurrency - Verification 


1 Introduction 


Concurrent separation logic (CSL) is a version of separation logic designed 
to enable compositional reasoning about concurrent programs that manipu- 
late memory possibly shared between threads [6,26]. Like standard separation 
logic [28], CSL is based on Hoare triples (AY C ( B), where C is a program and 
A and B are formulas (called the precondition and postcondition of the code 
respectively). The heart of the formalism is the following concurrency rule: 


{Ai}Ci {Bi} {42} C2 {Bo} 
{Ai ® Apo} Ci ll Co {Bı ® Bo} 


where ® is a so-called separating conjunction. This rule says that if two threads 
Cı and C; are run on spatially separated resources A; ® A5 then the result will be 
the spatially separated result, Bı ® B2, of running the two threads individually. 

However, since many or perhaps even most interesting concurrent programs 
do share some resources, & typically does not denote strict disjoint separation of 
memories, as it does in standard separation logic (where it is usually written as *). 
© The Author(s) 2020 
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Instead, it usually denotes a weaker sort of “separation” designed to ensure that 
the two threads at least cannot interfere with each others’ data. This gives rise to 
the idea of fractional permissions, which allow us to divide writeable memory into 
multiple read-only copies by adding a permission value to each location in heap 
memory. In the usual model, due to Boyland [5], permissions are rational numbers 
in the half-open interval (0, 1], with 1 denoting the write permission, and values in 
(0, 1) denoting read-only permissions. We write the formula A7, where 7 is a per- 
mission, to denote a “r share” of the formula A. For example, (x + a)°-> (typically 
written as x 2° a for convenience) denotes a “half share" of a single heap cell, with 
address x and value a. The separating conjunction A& B then denotes heaps realis- 
ing A and B that are “compatible”, rather than disjoint: where the heaps overlap, 
they must agree on the data value, and one adds the permissions at the overlapping 
locations [4]. E.g., at the logical level, we have the entailment: 


z93a&zrt93bpa-bA^rea. (1) 


Happily, the concurrency rule of CSL is still sound in this setting (see e.g. [29]). 
However, the use of this weaker notion of separation & causes complications 
for formal reasoning in separation logic, especially if one wishes to reason over 
arbitrary regions of memory rather than individual pointers. There are two par- 
ticular difficulties, as identified by Le and Hobor [24]. The first is that, since 
& denotes possibly-overlapping memories, one loses the main useful feature of 
separation logic: its nonambiguity about separation, which means that desirable 
entailments such as A9? & B9 E: (A & B)9? turn out to be false. E.g.: 


z93ae€yt3b(xea&yc db), 


Here, the two “half-pointers” on the LHS might be aliased (x = y and a = b), 
meaning they are two halves of the same pointer, whereas on the RHS they 
must be non-aliased (because we cannot combine two “whole” pointers). This 
ambiguity becomes quite annoying when one adds arbitrary predicate symbols 
to the logic, e.g. to support inductively defined data structures. 

'The second difficulty is that although recombining single pointers is straight- 
forward, as indicated by Eq. (1), recombining the shares of arbitrary formulae 
is challenging. E.g., A9? & A9? A, as shown by the counterexample 


reo1vyeo2)9?&(ro1vye2)9?*zrol1vyea2. 
y y 


The LHS can be satisfied by a heap with a 0.5-share of x and a 0.5-share of y, 
whereas the RHS requires a full (1) share of either x or y. 

Le et al. [24] address these problems by a combination of the use of tree shares 
(essentially Boolean binary trees) rather than rational numbers as permissions, 
and semantic restrictions on when the above sorts of permissions reasoning can 
be applied. For example, recombining permissions (40? & A95 E- A) is permitted 
only when the formula is precise in the usual separation logic sense (cf. [28]). 
The chief drawback with this approach is the need to repeatedly check these side 
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conditions on formulas when reasoning, as well as that said reasoning cannot be 
performed on imprecise formulas. 

Instead, we propose to resolve these difficulties by a different, two-pronged 
extension to the syntax of the logic. First, we propose that the usual “strong” 
separating conjunction *, which enforces the strict disjointness of memory, should 
be retained in the formalism in addition to the weaker ®. The stronger * supports 
entailments such as A9? « B9? — (A * B)®°, which does not hold when ® is 
used instead. Second, we introduce nominal labels from hybrid logic (cf. [3,10]) 
to remember that two copies of a formula have the same origin. We write a 
nominal «a to denote a unique heap, in which case entailments such as (a ^ 
A)95 & (a ^ A)? H a ^ A become valid. We remark that labels have been 
adopted for similar "tracking" purposes in several other separation logic proof 
systems [10,21, 23, 25]. 

The remainder of this paper aims to demonstrate that our proposed exten- 
sions are (i) weakly necessary, in that expected reasoning patterns fail under 
the usual formalism, (ii) correct, in that they recover the desired logical princi- 
ples, and (iii) sufficient to verify typical concurrent programming patterns that 
use sharing. Section 2 gives some simple examples that motivate our extensions. 
Section 3 then formally introduces the syntax and semantics of our extended for- 
malism. In Sect. 4 we show that our logic obeys the logical principles that enable 
us to reason smoothly with fractional permissions over arbitrary formulas, and 
in Sect. 5 we give some longer worked examples. Finally, in Sect. 6 we conclude 
and discuss directions for future work. 


2 Motivating Examples 


In this section, we aim to motivate our extensions to separation logic with per- 
missions by showing, firstly, how the failures of the logical principles described in 
the introduction actually arise in program verification examples and, secondly, 
how these failures are remedied by our proposed changes. 

The overall context of our work is reasoning about concurrent programs that 
share some data structure or region in memory, which can be described as a 
formula in the assertion language. If A is such a formula then we write A" to 
denote a “r share” of the formula A, meaning informally that all of the pointers 
in the heap memory satisfying A are owned with share 7. The main question 
then becomes how this notion interacts with the separating conjunction ®. There 
are two key desirable logical equivalences: 


(A@B)" = A"& B" (I) 
AT®? = A" @ A? (II) 


Equivalence (I) describes distributing a fractional share over a separating con- 
junction, whereas equivalence (II) describes combining two pieces of a previously 
split resource. Both equivalences are true in the H direction but, as we have seen 
in the Introduction, false in the = one. Generally speaking, ® is like Humpty 
Dumpty: easy to break apart, but not so easy to put back together again. 
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The key to understanding the difficulty is the following equivalence: 


ca@ySb = (ró5asy b) V (aw =yAa=bAa a) 

In other words, either x and y are not aliased, or they are aliased and the per- 
missions combine (the additive operation © on rational shares is simply normal 
addition when the sum is € 1 and undefined otherwise). This disjunction under- 
mines the notational economies that have led to separation logic's great successes 
in scalable verification [11]; in particular, (I) fails because the left disjunct might 
be true, and (IT) fails because the right disjunct might be. At a high level, & is 
a bit too easy to introduce, and therefore also a bit too hard to eliminate. 


2.1 Weak vs. Strong Separation and the Distribution Principle 


One of the challenges of the weak separating conjunction ® is that it interacts 
poorly with inductively defined predicates. Consider porting the usual separa- 
tion logic definition of a possibly-cyclic linked list segment from x to y from a 
sequential setting to a concurrent one by a simple substitution of & for x: 


lszy —aer (x = y ^ emp) V (3z. rH z & ls z y). 


Now consider a simple recursive procedure foo(x,y) that traverses a linked list 
segment from x to y: 


foo(x,y) { if x-y then return; else foo([x],y); | 


It is easy to see that foo leaves the list segment unchanged, and therefore satisfies 
the following Hoare triple: 


((Isz y)99) £oo(x,y) ; ((Isz y)95). 


'The intuitive proof of this fact would run approximately as follows: 


((Isz y) foo(x,y) 1 
if x-y then return; ((lsz y)"?) 
else (zXy^(zez6lszy)?) 
{x P z & (sz y)95) 
{x 93 z & (Iszy)°>} 
m e z@lIszy)o>} 
{(Isay)°?} 


foo([x],y); 


}  {(say)°°} 


However, because of the use of &, the highlighted inference step is not sound: 


z93z6(lszy)? e (x — z @lszy)?. (2) 


Reasoning over Permissions Regions in Concurrent Separation Logic 207 


To see this, consider a heap with the following structure, viewed in two ways: 


0.5 0.5 0.5 0.5 
L> ZAZ>T®OAT >Z = rz--z&z-ec 


This heap satisfies the LHS of the entailment in (2), as it is the ®-composition 
of a 0.5-share of x +> z and a 0.5-share of Is z z, a cyclic list segment from z back 
to itself (note that here z = y). However, it does not satisfy the RHS, since it is 
not a 0.5-share of the ®-composition of x — z with Is zz, which would require 
the pointer to be disjoint from the list segment. 

The underlying reason for the failure of this example is that, in going from 


(x 4 z@lszz)° to z P$ z & (lsz z)'?, we have lost the information that the 


pointer and the list segment are actually disjoint. This is reflected in the general 
failure of the distribution principle A" & B" E- (A & B)", of which the above 
is just one instance. Accordingly, our proposal is that the “strong” separating 
conjunction * from standard separation logic, which forces disjointness of the 
heaps satisfying its conjuncts, should also be retained in the logic alongside 6, 
on the grounds that (IT) is true for the stronger connective: 


(A x B)" = A" x B". (3) 


If we then define our list segments using * in the traditional way, namely 


lszy =dep (x = yAemp) V (3z. x z * lsz y), 


then we can observe that this second definition of Is is identical to the first on 
permission-free formulas, since & and * coincide in that case. However, when we 
replay the verification proof above with the new definition of ls, every & in the 
proof above becomes a x, and the proof then becomes sound. Nevertheless, we 
can still use & to describe permission-decomposition of list segments at a higher 
level; e.g., Isa y can still be decomposed as (ls æ y)9? & (Isa y)95. 


2.2 Nominal Labelling and the Combination Principle 


Unfortunately, even when we use the strong separating conjunction * to define 
list segments ls, a further difficulty still remains. Consider a simple concurrent 
program that runs two copies of foo in parallel on the same list segment: 


foo(x,y); || foo(x,y); 


Since foo only reads from its input list segment, and satisfies the specification 
{(Isx y)99) £oo(x,y) ; {(Isxy)°°}, this program satisfies the specification 


(lsz y) foo (x,y); || foo(x,y); (Isx y]. 
Now consider constructing a proof of this specification in CSL. First we view the 


list segment lsz y as the &-composition of two read-only copies, with permis- 
sion 0.5 each; then we use CSL's concurrency rule (see Sect. 1) to compose the 
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specifications of the two threads; last we recombine the two read-only copies to 
obtain the original list segment. The proof diagram is as follows: 


(sz y) 
((Isz y)9? & (ls æ y)9?) 


{(Isay)°?} {(Isay)°?} 


foo(x,y); foo(x,y); 


{(Isay)°?} {(Isay)°?} 


((Isz y)?? & (Isa y)9?] 
Ay [lsz y) 


However, again, the highlighted inference step in this proof is not correct: 


(Is z y)9? & (Isa: y)9? E Isa y. (4) 


A countermodel is a heap with the following structure, again viewed in two ways: 
(ayey yoshy = rmy@y y 


According to the first view of such a heap, it satisfies the LHS of (4), as it is the 
&-composition of two 0.5-shares of Isa y (one of two cells, and one of a single 
cell). However, it does not satisfy Isa y, since that would require every cell in 
the heap to be owned with permission 1. 

Like in our previous example, the reason for the failure of this example is that 
we have lost information. In going from Isx y to (Isz y)9? & (Isxy)°°, we have 
forgotten that the two formulas (ls x y)? are in fact copies of the same region. 
For formulas A that are precise in that they uniquely describe part of any given 
heap [12], e.g. formulas z — a, this loss of information does not happen and 
we do have A9? & A95 — A; but for non-precise formulas such as lsz y, this 
principle fails. 

However, we regard this primarily as a technical shortcoming of the formal- 
ism, rather than a failure of our intuition. It ought to be true that we can take 
any region of memory, split it into two read-only copies, and then later merge the 
two copies to re-obtain the original region. Were we conducting the above proof 
on pen and paper, we would very likely explain the difficulty away by adopting 
some kind of labelling convention, allowing us to remember that two formulas 
have been obtained from the same memory region by dividing permissions. 

In fact, that is almost exactly our proposed remedy to the situation. We 
introduce nominals, or labels, from hybrid logic, where a nominal a is interpreted 
as denoting a unique heap. Any formula of the form «^ A is then precise (in the 
above sense), and so obeys the combination principle 


(a A A)" & (a ^ A)? E (a ^ A)"87, (5) 
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where & is addition on permissions. Thus we can repair the faulty CSL proof 
above by replacing every instance of the formula Is x y by the “labelled” formula 
aAlsay (and adding an initial step in which we introduce the fresh label a). 


2.3 The Jump Modality 


However, this is not quite the end of the story. Readers may have noticed that 
replacing lszy by the “labelled” version a ^ ls z y also entails establishing a 
slightly stronger specification for the function foo, namely: 


((a ^lsz y)95) foo(x,y); {(aA Isxy)?*}. 


This introduces an extra difficulty in the proof (cf. Sect. 2.1); at the recursive call 


to £ooC[x] ,y), the precondition now becomes a5 ^ (x 23 zx (Is zy)°>)), which 
means that we cannot apply separation logic’s frame rule [32] to the pointer 
formula without first weakening away the label-share o2. 

For this reason, we shall also employ hybrid logic's “jump” modality @_, 
where the formula @,A means that A is true of the heap denoted by the label 
a. In the above, we can introduce labels @ and y for the list components x + z 
and Is zy respectively, whereby we can represent the decomposition of the list 
by the assertion Q,, (6 * 7). Since this is a pure assertion that does not depend 
on the heap, it can be safely maintained when applying the frame rule, and used 
after the function call to restore the label a, using the easily verifiable fact that 


Qalb * y) ^ (B* y) E a. 


Similar reasoning over labelled decompositions of data structures is seemingly 
necessary whenever treating recursion; we return to it in more detail in Sect. 5. 


3 Separation Logic with Labels and Permissions (SLip) 


Following the motivation given in the previous section, here we give the syntax 
and semantics of a separation logic, SLi p, with permissions over arbitrary formu- 
las, making use of both strong and weak separating conjunctions, and nominal 
labels (from hybrid logic [3,10]). First, we define a suitable notion of permissions 
and associated operations. 


Definition 3.1. A permissions algebra is a tuple (Perm, , 6,1), where Perm 
is a set (of “permissions”), 1 € Perm is called the write permission, and & 
and & are respectively partial and total binary functions on Perm, satisfying 
associativity, commutativity, cancellativity and the following additional axioms: 


T4 B T2 É T2 (non-zero) 
Yr. G1 is undefined (top) 
Va. Ini, 72. T = 741 Q 732 (divisibility) 


(m1 DT) @ TN =(™m & 1) G (r2 & v)  (left-dist) 
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The most common example of a permissions algebra is the Boyland fractional 
permission model ((0, 1] OQ, &, x,1), where permissions are rational numbers in 
(0, 1], x is standard multiplication, and © is standard addition but undefined if 
p+p » 1. From now on, we assume a fixed but arbitrary permissions algebra. 

With the permissions structure in place, we can now define the syntax of 
our logic. We assume disjoint, countably infinite sets Var of variables, Pred of 
predicate symbols (with associated arities) and Label of labels. 


Definition 3.2. We define formulas of SLip by the grammar: 


A:z-z-—y|2A|A^A|AVA|A— A (pure) 
|emp|zegy|P(xX)| A*A|JA8A|A-«A|A-&A (spatial) 
|A*|a| 84A (perms/labels) 


where x,y range over Var, m ranges over Perm, P ranges over Pred, o ranges 
over Label and x ranges over tuples of variables of length matching the arity of 
the predicate symbol P. We write x y for (x — y)", anda £ y for ^(x = y). 


The “magic wands” — and —® are the implications adjoint to x and &, as 
usual in separation logic. We include them for completeness, but we use — only 
for fairly complex examples (see Sect. 5.3) and in fact do not use —& at all. 


Semantics. We interpret formulas in a standard model of stacks and heaps- 
with-permissions (cf. [4]), except that our models also incorporate a valuation 
of nominal labels. We assume an infinite set Val of values of which an infinite 
subset Loc C Val are considered addressable locations. A stack is as usual a map 
s : Var — Val. A heap-with-permissions, which we call a p-heap for short, is a 
finite partial function h : Loc +g, Val x Perm from locations to value-permission 
pairs. We write dom (h) for the domain of h, i.e. the set of locations on which h 
is defined. Two p-heaps hy and he are called disjoint if dom (hi) dom (h2) = 9, 
and compatible if, for all £ € dom (hi1) N dom (hz), we have h1(£) = (v,71) 
and ha(v, m2) and mı ® T2 is defined. (Thus, trivially, disjoint heaps are also 
compatible.) We define the multiplication 7 -h of a p-heap h by permission 7 by 
extending & pointwise: 


(r-h)() 2(v,n& mv) & h(£) = (vv). 


We also assume that each predicate symbol P of arity k is given a fixed inter- 
pretation [P] € (Val* x PHeaps), where PHeaps is the set of all p-heaps. Here we 
allow an essentially free interpretation of predicate symbols, but they could also 
be given by a suitable inductive definition schema, as is done in many papers on 
separation logic (e.g. [7,8]). Finally, a valuation is a function p : Label —^ PHeaps 
assigning a single p-heap p(a) to each label a. 


Definition 3.3 (Strong and weak heap composition). The strong com- 
position hı o ha of two disjoint p-heaps hı and ha is defined as their union: 


hy (0) if £ d dom (ha) 


(hi o h3)(£) = A if L € dom (hi) 
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shppsz-y € s(z)=s(y) 

s, h, p = >A € s h,p HA 

shp HAAB © s,h,p}HA and s,h, p} B 

shpHAVB © s,hp-Aors,hpEB 

shp-FA>B © shp A implies s,h, p H B 

s, h, p | emp = dom(h)=0 

sh,pEzr-ey = dom(h) = {s(x)} and h(s(z)) = (s(y), 1) 

s,h,p — P(x) € (s(x),h) € [P] 

s,h,pH AxB €& Jhi, h2. h = hı o h2 and s, hı, p = A and s, h2, p = B 
shpE-FA®B © Jhi, h2. h = hi o ha and s, hı, p = A and s, h2, p - B 
ship A-—B <= VNh.ifhoh' defined and s, /,p = A then s,hoh’,p |= B 
shp-FA@®B © Vh.ifhoh' defined and s,h’,p E- A then s,hoh’,p_ KB 
s, h, p H= A" & JW.h=r-k ands,W,pE- A 

s,h,pHa € h-p(a) 

s, h, p = Qa A € s pla), pH A 


Fig. 1. Definition of the satisfaction relation s, h, p = A for SLip. 


If hy and ha are not disjoint then hi o ha is undefined. 
The weak composition hı © ho of two compatible p-heaps hı and hg is defined 
as their union, adding permissions at overlapping locations: 


(v, 71 € T2) if h1(£) = (v, mı) and h(l) = (v, 772) 
(hi © h3)(£) = 4 hi(£) if L g dom (h3) 
ha(£) if € Z dom (hi) 


If hy and ha are not compatible then hı o ha is undefined. 


Definition 3.4. The satisfaction relation s,h,p E- A, where s is a stack, h a 
p-heap, p a valuation and A a formula, is defined by structural induction on A in 
Fig. 1. We write the entailment A E- B, where A and B are formulas, to mean 
that if s, h,p = A then s,h,p | B. We write the equivalence A = B to mean 
that A = B and BE A. 


4 Logical Principles of SL,p 


In this section, we establish the main logical entailments and equivalences of SL, p 
that capture the various interactions between the separating conjunctions & and 
*, permissions and labels. As well as being of interest in their own right, many of 
these principles will be essential in treating the practical verification examples in 
Sect. 5. In particular, the permission distribution principle for » (cf. (3), Sect. 2) 
is given in Lemma 4.3, and the permission combination principle for labelled 
formulas (cf. (5), Sect. 2) is given in Lemma 4.4. 
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Proposition 4.1. The following equivalences all hold in Slip: 


A®B=B@A AxB=BxA 
A@®(B@C)=(A@B)@eC Ax(Bx«C)=(AxB)*«C 
A@®emp=A Axemp=A 


Additionally, the following residuation laws hold: 


T 
Q 


AFB-@®C & A@®BEC and AFB+*C & AxB 


In addition, we can always weaken x to &: Ax BE A & B. 


Next, we establish an additional connection between the two separating con- 
junctions & and x. 


Lemma 4.2 (&/* distribution). For all formulas A, B, C and D, 
(Ae B)«(C& D) E(A*C)G(B»xD). (&/*) 


Proof. First we show a corresponding model-theoretic property: for any p-heaps 
hı, h2, h3 and h4 such that (hı © hg) o (ha © h4) is defined, 


(hi © ha) o (ha © h4) = (hi o ha) © (ha o ha) (6) 


Since (hı © h2) o (ha © h4) is defined by assumption, we have that hı © hy and 
h3 © h4 are disjoint and that hı and ho, as well as hg and h4 are compatible. 
In particular, hı and ha are disjoint, so hı o hg is defined; the same reasoning 
applies to hg and h4. Moreover, since hı and hg are compatible, hı o h3 and 
h3 o h4 must be compatible and so (hı o ha) © (ha o h4) is defined. 

Now, writing h for (hy © h2) o (ha © h4), and letting £ € dom (h), we have 


hi(£) if £ £ dom (hg) ,£ d dom (h4) and £ d dom (hg) 
ha(£) if £ ¢ dom (h3) ,£ € dom (h4) and £ € dom (hi) 
(v, nı Om) if L g dom (ha) ,£ é dom (h4) and hi(£) = (v, 71) 
h(t) = and ha3(£) = (v, 772) 
ha(£) if £ g dom (hi) ,£ € dom (h2) and £ € dom (h4) 
ha4(£) if £ ¢ dom (hy) ,£ € dom (h2) and £ ¢ dom (ha) 
(u, n3 4) if £L dom (hi1), £ ¢ dom (h2) and ha(£) = (u,ma) 
and h4(£) = (u, m4) 


We can merge the first and fourth cases by noting that h(£) = (hi oha)(£) if L € 
dom (hz o h4), and similarly for the second and fifth cases. We can also rewrite 
the last two cases by observing that ¢ ¢ dom (hg) implies h1(£) = (hi o ha)(£), 
and so on, resulting in 
(hioha)() if £ g dom (ha o ha) 
h(£) = 4 (haoh4)(£) if £ g dom (hi o ha) 
(w,01G 053) if (hyo h3)(£) = (w,o1) and (ha o h4)(£) = (w,o3) 


= (Ua o ha) o (ha 9 ha))(£). 
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Now we show the main result. Suppose s, h, p | (A& B) «(C & D). This gives us 
h = (hı © ha) o (ha © h4), where s, hi, p E- A and s, ha, p H B and s,h3,p EC 
and s, h4, p H| D. By Eq. (6), we have h = (hi o ha) © (ha o h4), which gives us 
exactly that s, h, p E- (A * C) & (B * D), as required. 


Next, we establish principles for distributing permissions over various con- 
nectives, in particular over the strong *, stated earlier as (3) in Sect. 2. 


Lemma 4.3 (Permission distribution). The following equivalences hold for 
all formulas A and B, and permissions v and a: 


(A7)" = A787 (8) 
(AV B)" = A" Vv B" (V7?) 
(AA BY = A" A B" (^7) 
(A* B)" = A" x B" («7) 


Proof. We just show the most interesting case, (*^). First of all, we establish 
a corresponding model-theoretic property: for any permission 7 and disjoint p- 
heaps hı and hg, meaning hı o hg is defined, 


m (hio hg) = (7 - hi) o (s - ho). (T) 


To see this, we first observe that for any £ € dom (hı o h2), we have that either 
£ € dom (hi) or £ € dom (hz). We just show the case £ € dom (h1), since the other 
is symmetric. Writing h1(£) = (v1, 71), and using the fact that £ Z dom (ha), 


T : (hi o h3)(£) = (vim & m) = (7 - h1)(£) = ((7 - hi) o (n h2)) (0. 
Now for the main result, let s, h and p be given. We have 


s,h,p |= (A * B)" 

h=7-h' and s,,pE Ax B 

h=7-h! and h’ = hi o ho and s, hi, p H A and s, h2, p H B 

h =r- (hı o hg) and s, hı, p = A and s, h2, p H B 

h= (n - hı) o (r - h2) and s, h1, p H A and s, h2, p H| B by (7) 
h = oh, and s, hi, p - A7 and s, h3, p H B" 

s, h, p |= A" x B". 
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We now establish the main principles for dividing and combining permissions 
formulas using &. As foreshadowed in Sect.2, the combination principle holds 
only for formulas that are conjoined with a nominal label (cf. Eq. (5)). 


Lemma 4.4 (Permission division and combination). For all formulas A, 
nominals a, and permissions 7,72 such that 1, © T2 is defined: 


ANITI O A mu (Split &) 
(aA A)™ & (a ^ A)? H (a ^ A)™ 97 (Join &) 
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Proof. Case (Split &): Suppose that s, h, p | A71972, We have h = (171 @72)-h’, 
where s, h’, p H A. That is, for any £ € dom (h), we have h’(€) = (v,7) say and, 
using the permissions algebra axiom (left-dist) from Definition 3.1, 


h(£) = (v, (r1 © 12) Q T) = (v, (my & T) 8 (12 Q 7)). 
Now we define p-heaps hı and hz, both with domain exactly dom (h), by 
hilh = (v, Tmt IT) & h(£) =(v,7) for i € {1,2}. 


By construction, hy = mı - h/ and hg = mə - h’. Since s, /, p E- A, this gives us 
s, hi, p =|= A™ and s, hg, p | A™. Furthermore, also by construction, hy and hg 
are compatible, with h = hı © ha. Thus s, h, p = A"! & A™, as required. 


Case (Join &): First of all, we show that for any p-heap h, 


(mi: h) © (T2 h) = (m ® ma) - h. (8) 


To see this, we observe that for any 4 € dom (h), writing h(£) = (v, 


h)(0) 

v, (11 © 73) & T) 

v, (11 8 7) S (72 Q T)) by Ms dist) 

hi ® h2) (£ ) where hi(£) = (v, QT) and hg = 
m 


: h) © (n2 - h))(0). 


Now, for the main result, suppose s, h, p 


7) say, 


(mi 8 72) - 


(v, T2 ® T) 


( 
=( 
=( 
=( 
=( 


E (a A A)™ & (a A A)™. We have 


h = hy © hg where s,hj,p 


h= (mı - hi) o (72 


- h5), where s, h^, p 


= (aA A)™ and s,ho,p 
= aA A and s,h5,p 


H (a ^ A)™. That is, 
= a^ A. Thus 
= «^A. 


hi, = hy = p(a) and so, by (8), we have h = 


(7173) “hi, where S, hi, p 


This gives us s, h, p = 


(a ^ A)™:®72, as required. 


Lastly, we state some useful principles for labels and the “jump” modality. 


Lemma 4.5 (Labelling and jump). For all formulas A and labels a, 


@,AAa™ E- A" (@ Elim) 

(a^ A)” H @aA (@ Intro) 

@a( b1” * B27) A (617 & b27) E a ^ (B1 * Bo?) (Q/ « /&) 

Proof. We just show the case (@/ x /&), the others being easy. Suppose s, h, p H 
Q,(5* * B?) ^ (617 ® B»? ), meaning that S, pla), p m~ pı” * B? and S, h, p = 
61” & By”. Then we have pfa) = (1 p((1)) o (a - p(G2)), while h = (m p(61)) 3 


(c - p(B5)). Since o is defined only when its arguments are disjoint p-heaps, we 
obtain that h = p(a) = (v - p(£1)) o (c - p(B3)). Thus s, h, p H aA (17 * B5? ). 
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{Ai} Ci {Bi} {42} C2 {Bo} (a^ A} C {B} 
Par —————————— (X) (Label) 
{Ai ® A2} Ci || C2 {Bi & B2} {A} C (B) 
{A} C {B} {A} C {B} 
neue (t, i) (Frame +) ances (t) (Frame &) 


(Œ) ModVars(C2) N FreeVars(A1, B1) = ModVars(C1) n FreeVars( A5, B3) = 0 
(B3) a fresh (1) ModVars(C) n FreeVars(F) = Ø (1) see 85.3 


Fig. 2. The key CSL proof rules used in our examples; not shown are standard rules for 
consequence, conditionals, load/store, etc. The fresh-labelling rule (Label) and com- 
bination of both weak (Frame &) and strong (Frame *) frame rules are novel to our 
approach. We require weak conjunction & for the parallel rule (Par). 


5 Concurrent Program Verification Examples 


In this section, we demonstrate how Sl, p can be used in conjunction with the 
usual principles of CSL to construct verification proofs of concurrent programs, 
taking three examples of increasing complexity. 

Our examples all operate on binary trees in memory, defined as usual in 
separation logic (again note the use of * rather than &): 


tree(z) —aer (x = null ^ emp) V (3d,l,r. x — (d,l, r) * tree(l) * tree(r)). 


Our proofs employ (a subset of) the standard rules of CSL—with the most impor- 
tant being the concurrency rule from the Introduction, the separation logic frame 
rules for both * and &, and a new rule enabling us to introduce fresh labels into 
the precondition of a triple (similar to the way Hoare logic usually handles exis- 
tential quantifiers). These key rules are shown in Fig. 2. We simplify our Hoare 
triple to remove elements to handle function call/return and furthermore omit 
the presentation of the standard collection of rules for consequence, load, store, 
if-then-else, assignment, etc.; readers interested in such aspects can consult [1]. 
Both of our frame rules have the usual side condition on modified program vari- 
ables. The strong frame rule (Frame x) has an additional side condition that will 
be discussed in Sect. 5.3; until then it is trivially satisfied. 


5.1 Parallel Read 


Consider the following program: 


check(x) { 
if (x == null) { return; } 
read(x) ; || read(x); 
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This is intended to be a straightforward example where we take a tree rooted 
at x and, if x is non-null, split into parallel threads that run the program read 
on x, and whose specification is [a^ A tree(x)?}read(x) {a7 ^ tree(z)? )]. We 
prove that check satisfies the specification {tree(x)” } check(x) {tree(x)" }; the 
verification proof is in Fig. 3. The proof makes use of the basic operations of our 
theory: labelling, splitting and joining. The example follows precisely these steps, 
starting by labelling the formula tree(zr)" ^ x Z null with a. The concurrency 
rule (Par) allows us to put formulas back together after the parallel call, and the 
two copies (a ^ tree(z)7)9? that were obtained are glued back together to yield 
tree(x)^, since they have the same label. 


{tree(x)”} 

check(x) { 

{(tree(x)” A x = null) V (tree(x)" ^ x Z null)} 

if (x == null) { {x = null ^ tree(x)"} 

return; 

{tree(x)”} 

} 

{a A tree(x)” ^ x # null} by (Label) 
{(a ^ tree(z)7)9? & (a A tree(z)")9?) by (Split &) 
{(a ^ tree(z)7)95Y 

(o7 ^ tree(z)7 975) by (A*),(@) 
read(x); 

(a9 A tree(z)7995) 

{(a ^ tree(z)7)^9] by (^7), (8) 
{(a ^ tree(z)7)9? & (a ^ tree(z)7)9 9) by (Par) 

{a A tree(x)™} by (Join &) 
} 

{tree(x)”} 


Fig. 3. Verification proof of program check in Example 5.1. 


5.2 Parallel Tree Processing (Le and Hobor [24]) 


Consider the following program, which was also employed as an example in [24]: 


proc(x) { 
if (x == null) { return; } 
print (x->d) ; print (x->d) ; 
proc(x->1); proc(x->1) ; 
proc(x->r) ; proc(x->r) ; 
j 


This code takes a tree rooted at x and, if x is non-null, splits 
into parallel threads that call proc recursively on its left and right 


Reasoning over Permissions Regions in Concurrent Separation Logic 217 


branches. We prove, in Fig.4, that proc satisfies the specification 
{a A tree(z)") proc GO {a A tree(x)"}. First we unroll the definition of tree(z) 
and distribute the permission over Boolean connectives and x. If the tree is 
empty the process stops. Otherwise, we label each component with a new label 
and introduce the “jump” statement Q (01 * G2 * 33), recording the decompo- 
sition of the tree into its three components. Since such statements are pure, i.e. 
independent of the heap, we can “carry” this formula along our computation 
without interfering with the frame rule(s). Now that every subregion is labelled, 
we split the formula into two copies, each with half share, but after distributing 
0.5 over * and ^ we end up with half shares in the labels as well. We relabel each 
subregion with new “whole” labels, and again introduce pure Q-formulas that 
record the relation between the old and the new labels. At this moment we enter 
the parallel threads and recursively apply proc to the left and right subtrees of x. 
Assuming the specification of proc for subtrees of x, we then retrieve the original 
label a from the trail of crumbs left by the @-formulas. We can then recombine 
the a-labelled threads using (Join &) to arrive at the desired postcondition. 


5.3  Cross-thread Data Transfer 


Our previous examples involve only “isolated tank” concurrency: a program has 
some resources and splits them into parallel threads that do not communicate 
with each other before—remembering Humpty Dumpty!—ultimately re-merging. 
For our last example, we will show that our technique is expressive enough to 
handle more sophisticated kinds of sharing, in particular inter-thread coarse- 
grained communication. We will show that we can not only share read-only 
data, but in fact prove that one thread has acquired the full ownership of a 
structure, even when the associated root pointers are not easily exposed. 

To do so, we add some communication primitives to our language, together 
with their associated Hoare rules. Coarse-grained concurrency such as locks, 
channels, and barriers have been well-investigated in various flavours of concur- 
rent separation logic [19,26,31]. We will use a channel for our example in this 
section but with simplified rules: the Hoare rule for a channel c to send message 
number i whose message invariant is R¢ is {RF(x)} send(c, x) {emp}, while the 
corresponding rule to receive is {emp} receive(c) {Aret. R&(ret)}. We ignore 
details such as identifying which party is allowed to send/receive at a given 
time [14] or the resource ownership of the channel itself [18]. 

These rules interact poorly with the strong frame rule from Fig. 2: 


{A} C {B} » (t) ModVars(C) n FreeVars(F) = 0 
{Ax F}C{B* F} ({,) (Frame *) (1) C does not receive resources 


The revealed side condition (1) means that C does not contain any subcommands 
that "transfer in" resources, such as unlock, receive, etc.; this side condition 
is a bit stronger than necessary but has a simple definition and can be checked 
syntactically. Without (1), we can reach a contradiction. Assume that the current 
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{a A tree(x)”} 

proc(x) { 

{(a ^ (x = null ^ emp)") V (a ^ (a > (d,l, r) x tree(l) * tree(r))”)} by (A™),(V7) 
{(a A x = null ^ emp) V (a A (x > (d,l, T) * tree(l) * tree(r))”)} 

if (x == null) { {a ^A x = null ^ emp) 

return; 

{a ^ (x = null ^ emp)” } 

n ^ tree(z)" } 


{a ^ (a 5 (dl, r) * tree(1)* * tree(r)")) by («7) 
{((B1 Ax 5 (dl, r)) * (Bà A tree(1)”) * (Ba A tree(r)”))A by (Label), 
@a(Bi * Ba * Bs} (@ Intro) 
((((895 Aw “PS” (d,1, r)) * (89 A tree(l eee, 
(63° ^ tree(r)79^?)) ^ (@ (Br * fa * Bs))”°)® 
b d (don) * (895 ^ tree(l ae 5). by (Split &), 
B3” A ac y* 28) ^ (Qs (f * Ba * Bs))°°)} 7), (^7) 


E ^ x 78S? (d, Lr) ^ Oye) * (ya ^ tree(1)7 995 A @,, 69-9) 

va ^ ide Oa 5 ^ Q,,8$7)) ^ Gs (f1 * Ba * B3))® 

ya ^ x “PS? (d,l,r) ^ Q Br (s x (ys ^ tree(1)7 995 A @,, 82:5)« by (Label), 
Ye ^ tree(r)7 9*9 A Qu, 83" Pus Q (Bi * Ba * B3))} (@ Intro) 

1 WAL ee (d,l,r) AQ. Bi l1 5)x 

y2 ^ tree re 5^Q,,82?)* 

"js Ac tree (p) OTEA an 69°)) ^ Ga (B * Ba * B3)} 
print (x->d) ; 
(m ^a "PE" (d,1,r) ^ Qs, B9) 

"ya ^ tree "m 5^G,,82?)* 

Ja ^ tree(r)7 999 A à, 69°)) ^ Gs (B1 * B2 * B3)} 
proc(x->1) ; 

(n Aa "8$ (d, lyr) A @y p5)» 

y2 A tree(1)7 995 A @,. 82)» 

y3 ^ tree(r)" 995 A à... 899)) A @a( b1 * B2 * B3)} 
proc(x->r) ; 

i Aw "PS? (dl, r) ^ Qu, 895) 

y2 ^ tree(1)"995 A @,, 82)» 

y3 ^ tree(r)" 995 A Q.. 899)) ^ @a (1 * Bo * B3)} 
(((89:5 ^ x "P$? (d, L, r)) « (895 ^ tree(1)7 99-5). 
(857 ^ tree(r)™°?)) ^ (Gs (Bx * Ba * B3) ^) 


{(((B1 Ax (d,l,r)) * (Ba ^ tree(1)”)* 

Bs ^ tree(r)")) ^ Ga (B * Ba * B3) ^] by (A"),(#") 
((o ^ (x 5 (d, Lr) * tree(1)” x tree(r)7))9 5) by (@/ = /&) 
{(a ^ (x  (d,l,r) x tree(1)” * tree(r)"))95& 

a (x5 (d,l,r) * tree(1)* * tree(r)”))°?} by (Par) 

} by (Join &) 


{aA (a 5 (d,l,r) * tree(1)” x tree(r)") 
} 


{a A tree(x)”} 


Fig. 4. Verification proof of Le and Hobor’s program from [24] in Example 5.2. 
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100 void transfer(int key) { { emp} 
101 rt* = make tree(); 
102 { tree(rt) } 

{ (aA tree(rt))°? Y 

tree* sub = find(rt, key) 

send(ch, sub) 


{ (a Atree(rt))°? Y 


| 

| 2s 

| tree» sub = receive(ch) =; 
TE | modify(sub) A 
receive(ch) | 
{ (eA tree(rt))™” } | 
400 { tree(rt) } 
401 delete_tree(rt); { emp } } 


send(ch, ()) ; 
{ (e^tree(rt))?? } 


Fig. 5. Verification proof of the top and bottom of transfer in Example 5.3. 


message invariant Rọ is x p$ a, which has been sent by thread B. Now thread A, 
which had the other half of x 22 a, can reason as follows: 


{emp} receive(c) {x 93 a} 


— - 55 z5 — (Frame *), without (1) 
[emp x x — a] receive(c) {x > a* x — a} 


'The postcondition is a contradiction as no location strongly separates from itself. 
However, given (t) the strong frame rule can be proven by induction. 

The consequence of (1), from a verification point of view, is that when 
resources are transferred in they arrive weakly separated, by ®, since we must use 
the weak frame rule around the receiving command. The troublesome issue is 
that this newly "arriving" state can thus &-overlap awkwardly with the existing 
state. Fortunately, judicious use of labels can sort things out. 

Consider the code in Fig.5. The basic idea is simple: we create some data 
at the top (line 101) and then split its ownership 50-50 to two threads. The left 
thread finds a subtree, and passes its half of that subtree to the right via a chan- 
nel. T'he right thread receives the root of that subtree, and thus has full ownership 
of that subtree along with half-ownership of the rest of the tree. Accordingly, 
the right thread can modify that subtree before notifying the left subtree and 
passing half of the modified subtree back. After merging, full ownership of the 
entire tree is restored and so on line 401 the program can delete it. Figure 5 only 
contains the proof and line numbers for the top and bottom shared portions. 
The left and the right thread’s proofs appear in Fig. 6. 

By this point the top and bottom portions of the verification are straight- 
forward. After creating the tree tree(rt) at line 102, we introduce the label a, 
split the formula using (Split &), and then pass (a A tree(rt))?? to both threads. 
After the parallel execution, due to the call to modify (sub) in the right thread, 
the tree has changed in memory. Accordingly, the label for the tree must also 
change as indicated by the (cAtree(rt))?? in both threads after parallel process- 
ing. These are then recombined on line 400 using the re-combination principle 
(Join &), before the tree is deallocated via standard sequential techniques. 
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200 { (a ^tree(rt))? } 

200 tree* sub - find(rt, key); 

202 { (o9? ^ tree(sub) * (tree(sub) — tree(rt)))^^ } 

(o9? ^ (B A tree(sub)) * (y ^ (tree(sub) — tree(xt))))^^ } 


T" a®? ^ ((B ^ tree(sub)) * (y ^ (tree(sub) — tree(rt))))” PA \ 
(Q2°((BA be vane (y ^ (tree(sub) — tree(rt)))) » 
( 
(@ *((8 ^ tree(sub)) * (y ^ (tree(sub) — tree(rt))) o) 


B ^ tree(sub))9? « (y ^ (tree(sub) — tree(rt)))° 2 \ 
(y ^ (tree(sub) — avo NM & (B ^ tree(sub))° jo ^ \ 


207 Senach, sub); 
208 { (yA (tree(sub) — tree(rt)))?? } 


aio { de ^ (tree(sub) — tree(rt)))°° } 
2100 receive(ch); 

(7 ^ (tree(sub) — tree(rt)))9? & ((@9°(6 ^ tree(sub) — e ^ tree(rt))^?) A 
Dr { y £5 A6 ^tree(sub)??) \ 
213 { yA (5 A tree(sub) — € A tree(rt))°° & ó A tree(sub)°? A «y 1 5} 
214 { (6 A tree(sub) — € A tree(rt))°° «5 ^ tree(sub)?? 
ais { (eAtree(rt))°° } 


300 { (a A tree(rt))°° } 


302 ( (aA tree(rt pe 
303 tree* sub = receive(ch); 
(a A tree(rt))°° & (B ^ tree(sub))°° ^ 

{ (@2,5((G ^ tree(sub)) * (y ^ (tree(sub) —* tree(xt))))^?) \ 
sos { ((@ A tree(sub)) * (y ^ (tree(sub) — tree(rt))))°° & (8 ^tree(sub))?? } 
aos { ((B A tree(sub))9? & (8 ^ tree(sub))??) « (y A (tree(sub) — co BE. 
sov { tree(sub) * (y ^ (tree(sub) — tree(rt)))^? | 
308 modify (sub); 
so» { tree(sub) * (y ^ (tree(sub) — tree(rt)))^? | 
310 { (5A tree(sub)) * (y ^ ((d ^ tree(sub)) — (eA^tree(rt)))) ^ Ay Ld } 

((8 ^ tree(sub))9? & (5 ^ tree(sub))9?) * (y ^ ((5 ^ tree(sub)) — (e€ A tree(rt))))°° ^ 
{ y L6 ^ (G9? ((8 A tree(sub)) — (c ^ tree(rt)))9* \ 
» * (y ^ (8 ^ tree(sub)) = (e A tree(rt))))°°) & \ 


Ay L6 ^ (FFEA Ms (€ ^ tree(rt)))9?) 


{ ((6 ^ tree(sub 

e (6 ^ tree(sub))? 
(€ ^ tree(rt))°° 

5 { (6 ^ tree(sub SEA 5 Ay 16^ (G95((8 ^ tree(sub)) — (c ^ tree(rt)))"?) 

sia send (ch, (0); 

ais { (€^ tree(rt))5 } 


Fig. 6. Verifications of the left (top) and right (bottom) threads of transfer. 


Let us now examine the more interesting proofs of the individual threads in 
Fig. 6. Line 201 calls the find function, which searches a binary tree for a subtree 
rooted with key key. Following Cao et al. [13] we specify find as follows: 


{ tree(x)” } find(x) { Aret. (tree(ret) « (tree(ret) — tree(z)))" } 
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Here ret is bound to the return value of find, and the postcondition can be 
considered to represent the returned subtree tree(ret) separately from the tree- 
with-a-hole tree(ret) — tree(x), using a */—« style to represent replacement as 
per Hobor and Villard [20]. This is the invariant on line 202. 

Line 203 then attaches the fresh labels 8 and y to the «separated subparts, 
and line 204 snapshots the formula current at label o using the @ operator; @7 P 
should be read as “when one has a z-fraction of a, P holds"; it is definable using 
@ and an existential quantifier over labels. On line 205 we forget (in the left 
thread) the label a for the current heap for housekeeping purposes, and then 
on line 206 we weaken the strong separating conjunction * to the weak one & 
before sending the root of the subtree sub on line 207. 

In the transfer program, the invariant for the first channel message is 


(B ^ tree(sub))°° ^ (@9;5((8 ^ tree(sub)) * (y ^ (tree(sub) — tree(rt))))9?) 


In other words, half of the ownership of the tree rooted at sub plus the (pure) 
@-fact about the shape of the heap labeled by a. Comparing lines 206 and 208 we 
can see that this information has been shipped over the wire (the G-information 
has been dropped since no longer needed). The left thread then continues to 
process until synchronizing again with the receive in line 211. 

Before we consider the second synchronization, however, let us instead jump 
to the corresponding receive in the right thread at line 303. After the receive, 
the invariant on line 304 has the (weakly separated) resources sent from the left 
thread on line 206. We then “jump” label o using the @-information to reach 
line 305. We can redistribute the ( inside the * on line 306 since we already know 
that @ and y are disjoint. On line 307 we reach the payoff by combining both 
halves of the subtree sub, enabling the modification of the subtree in line 308. 

On line 310 we label the two subheaps, and specialize the magic wand so that 
given the specific heap ó it will yield the specific heap e; we also record the pure 
fact that y and 6 are disjoint, written y L ô. On line 311 we snapshot y and split 
the tree sub 50-50; then on line 312 we push half of sub out of the strong *. On 
line 313 we combine the subtree and the tree-with-hole to reach the final tree e. 
We then send on line 314 with the channel's second resource invariant: 


(8 ^ tree(sub))°° A y .L 6 ^ (G9? ((6 ^ tree(sub)) — (e ^ tree(rt)))9?) 


After the send, on line 315 we have reached the final fractional tree c. 

Back in the left-hand thread, the second send is received in line 211, leading 
to the weakly-separated postcondition in line 212. In line 213 we “jump” label 
y, and then in line 214 we use the known disjointness of y and 6 to change the 
& to *. Finally in line 215 we apply the magic wand to reach the postcondition. 


6 Conclusions and Future Work 


We propose an extension of separation logic with fractional permissions [4] in 
order to reason about sharing over arbitrary regions of memory. We identify two 
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fundamental logical principles that fail when the “weak” separating conjunc- 
tion & is used in place of the usual “strong” *, the first being distribution of 
permissions—A* & B™ |‘ (A& B)"—Aand the second being the re-combination of 
permission-divided formulas, A” & A? j£ A™®?. We avoid the former difficulty 
by retaining the strong * in the formalism alongside &, and the latter by using 
nominal labels, from hybrid logic, to record exact aliasing between read-only 
copies of a formula. 

The main previous work addressing these issues, by Le and Hobor [24], uses a 
combination of permissions based on tree shares [17] and semantic side conditions 
on formulas to overcome the aforementioned problems. The rely-guarantee sepa- 
ration logic in [30] similarly restricts concurrent reasoning to structures described 
by precise formulas only. In contrast, our logic is a little more complex, but we 
can use permissions of any kind, and do not require side conditions. In addition, 
our use of labelling enables us to handle examples involving the transfer of data 
structures between concurrent threads. 

On the other hand, we think it probable that the kind of examples we consider 
in this paper could also be proven by hand in at least some of the verification 
formalisms derived from CSL (e.g. [16,22,27]). For example, using the “concur- 
rent abstract predicates" in [16], one can explicitly declare shared regions of 
memory in a fairly ad-hoc way. However, such program logics are typically very 
complicated and, we believe, quite unlikely to be amenable to automation. 

We feel that the main appeal of the present work lies in its relative 
simplicity—we build on standard CSL with permissions and invoke only a modest 
amount of extra syntax—which bodes well for its potential automation (at least 
for simpler examples). In practical terms, an obvious way to proceed would be 
to develop a prototype verifier for concurrent programs based on our logic SL, p. 
An important challenge in this area is to develop heuristics—e.g., for splitting, 
labelling and combining formulas—that work acceptably well in practice. 

An even greater challenge is to move from verifying user-provided specifi- 
cations to inferring them automatically, as is done e.g. by Facebook INFER. In 
separation logic, this crucially depends on solving the biabduction problem, which 
aims to discover “best fit” solutions for applications of the frame rule [9,11]. In 
the CSL setting, a further problem seems to lie in deciding how applications of 
the concurrency rule should divide resources between threads. 

Finally, automating the verification approach set out in this paper will likely 
necessitate restricting our full logic to some suitably tractable fragment, e.g. 
one analogous to the well-known symbolic heaps in standard separation logic 
(cf. [2, 15]). The identification of such tractable fragments is another important 
theoretical problem in this area. It is our hope that this paper will serve to 
stimulate interest in the automation of concurrent separation logic in particular, 
and permission-sensitive reasoning in general. 
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Abstract. There has been a large body of work on local reasoning for 
proving the absence of bugs, but none for proving their presence. We 
present a new formal framework for local reasoning about the presence of 
bugs, building on two complementary foundations: 1) separation logic and 
2) incorrectness logic. We explore the theory of this new incorrectness sep- 
aration logic (ISL), and use it to derive a begin-anywhere, intra-procedural 
symbolic execution analysis that has no false positives by construction. In 
so doing, we take a step towards transferring modular, scalable techniques 
from the world of program verification to bug catching. 


Keywords: Program logics - Separation logic - Bug catching 


1 Introduction 


'There has been significant research on sound, local reasoning about the state 
for proving the absence of bugs (e.g., [2,13,26,29,30, 41]). Locality leads to tech- 
niques that are compositional both in code (concentrating on a program com- 
ponent) and in the resources accessed (spatial locality), without tracking the 
entire global state or the global program within which a component sits. Com- 
positionality enables reasoning to scale to large teams and codebases: reasoning 
can be done even when a global program is not present (e.g., a library, or during 
program construction), without having to write the analogue of a test or verifi- 
cation harness, and the results of reasoning about components can be composed 
efficiently [11]. 

Meanwhile, many of the practical applications of symbolic reasoning have 
aimed at proving the presence of bugs (i.e., bug catching), rather than proving 
their absence (i.e., correctness). Logical bug catching methods include symbolic 
model checking [7, 12] and symbolic execution for testing [9]. These methods are 
usually formulated as global analyses; but, the rationale of local reasoning holds 
just as well for bug catching as it does for correctness: it has the potential to 
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benefit scalability, reasoning about incomplete code, and continuous incremental 
reasoning about a changing codebase within a continuous integration (CI) sys- 
tem [34]. Moreover, local evidence of a bug without usually-irrelevant contextual 
information can be more convincing and easier to understand and correct. 

There do exist symbolic bug catchers that, at least partly, address scalabil- 
ity and continuous reasoning. Tools such as Coverity [5,32] and Infer [18] hunt 
for bugs in large codebases with tens of millions of LOC, and they can even 
run incrementally (within minutes for small code changes), which is compati- 
ble with deployment in CI to detect regressions. However, although such tools 
intuitively share ideas with correctness-based compositional analyses [16], the 
existing foundations of correctness-based analyses do not adequately explain 
what these bug-catchers do, why they work, or the extent to which they work 
in practice. 

A notable such example is the relation between separation logic (SL) and 
Infer. SL provides novel techniques for local reasoning [28], with concise specifi- 
cations that focus only on the memory accessed [36]. Using SL, symbolic execu- 
tion need not begin from a “main” program, but rather can “begin anywhere” 
in a codebase, with constraints on the environment synthesized along the way. 
When analyzing a component, SL’s frame rule is used in concert with abductive 
inference to isolate a description of the memory utilized by the component [11]. 
Infer was closely inspired by SL, and demonstrates the power of SL’s local rea- 
soning: the ability to begin anywhere supports incremental analysis in CI, and 
compositionality leads to highly scalable methods. These features have led to 
non-trivial impact: a recent paper quotes over 100,000 Infer-reported bugs fixed 
in Facebook’s codebases, and thousands of security bugs found by a composi- 
tional taint analyzer, Zoncolan [18]. However, Infer reports bugs using heuristics 
based on failed proofs, whereas the SL theory behind Infer is based on over- 
approximation [11]. Thus, a critical aspect of Infer’s successful deployment. is 
not supported by the theory that inspired it. This is unfortunate, especially 
given that the begin-anywhere and scalable aspects of Infer’s algorithms do not 
appear to be fundamentally tied to over-approximation. 

In this paper, we take a step towards transferring the local reasoning tech- 
niques from the world of program verification to that of bug catching. To app- 
roach the problem from first principles, we do not try to understand tools such 
as Coverity and Infer as they are. Instead, we take their existence and reported 
impact as motivation for revisiting the foundations of SL, this time re-casting it 
as a formalism for proving the presence of bugs rather than their absence. 

Our new logic, incorrectness separation logic (ISL), marries local reasoning 
based on SL’s frame rule with the recently-advanced incorrectness logic [35], a 
formalism for reasoning about errors based on an under-approzimate analogue 
of Hoare triples [43]. We observe that the original SL model, based on partial 
heaps, is incompatible with local, under-approximate reasoning. The problem 
is that the original model does not distinguish a pointer known to be dangling 
from one about which we have no knowledge; this in turn contradicts the frame 
rule for under-approximate reasoning. However, we recover the frame rule for a 
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refined model with negative heap assertions of the form x y^ , read “invalidated 
x”, stating that the location at x has been deallocated (and not re-allocated). 
Negative heaps were present informally in the original Infer, unsupported by the- 
ory but added for reporting use-after-free bugs (i.e., not for proving correctness). 
Interestingly, this semantic feature is needed in ISL for logical (and not merely 
pragmatic) reasons, in that it yields a sound logic for proving the presence of 
bugs: when ISL identifies a bug, then there is indeed a bug (no false positives), 
given the assumptions of the underlying ISL model. (That is, as usual, sound- 
ness is a relationship between assumptions and conclusions, and whether those 
assumptions match reality (i.e., running code) is a separate concern, outside the 
purview of logic.) 

As well as being superior for bug reporting, our new model has a pleasant fun- 
damental property in that it meshes better with intuitions originally expressed of 
SL. Specifically, our model admits a footprint theorem, stating that the meaning 
of a command is solely determined by its transitions on input-output heaplets 
of minimal size (including only the locations accessed), a theorem that was not 
true in full generality for the original SL model. Interestingly, ISL supports local 
reasoning for technically simpler reasons than the original SL (see Sect. 4.2). 

We validate part of the ISL promise using an illustrative program anal- 
ysis, Pulse, and use it to detect memory safety bugs, namely null-pointer- 
dereference and use-after-free bugs. Pulse is written inside Infer [18] and deployed 
at Facebook where it is used to report issues to C++ developers. Pulse is cur- 
rently under active development. In this paper, we explore the intra-procedural 
analysis, i.e., how it provides purely local reasoning about one procedure at a 
time without using results from other procedures; we defer formalising its inter- 
procedural (between procedures) analysis to future work. While leaving out the 
inter-procedural capabilities of Pulse only partly validates the promise of the 
ISL theory, it already demonstrates how ISL can scale to large codebases, and 
run incrementally in a way compatible with CI. Pulse thus has the capability to 
begin anywhere, and it achieves scalability while embracing under- rather than 
over-approximation. 


Outline. In Sect. 2 we present an intuitive account of ISL. In Sect. 3 we present 
the ISL proof system. In Sect. 4 we present the semantic model of ISL. In Sect. 5 
we present our ISL-based Pulse analysis. In Sect.6 we discuss related work 
and conclude. The full proofs of all stated theorems are given in the techni- 
cal appendix [38]. 


2 Proof of a Bug 


We proceed with an intuitive description of ISL for detecting memory safety 
bugs. To do this, in Fig. 1 we present an example of C++ use-after-lifetime bug, 
abstracted from real occurrences we have observed at Facebook, where use-after- 
lifetime bugs were one of the leading developer requests for C++ analysis. Given 
a vector v, a call to push back(v) in the std::vector library may cause the 
internal array backing v to be (deallocated and subsequently) reallocated when v 
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void deref_after_pb(std::vector<int> *v) { 
int *x = &v->at(1); 
v->push_back (42) ; 
std::cout << *x << "\n"; } 


push_back.cpp:7: error: VECTOR_INVALIDATION. accessing memory that was 
potentially invalidated by ’std::vector::push_back()’ on line 6. 

5. int *x = &(v->at(1)); 

6. v->push_back (42) ; 

7. > std::cout << *x << "Wn"; } 


Fig. 1. The C++ use-after-lifetime bug (above); the Pulse error message (below). 


needs to grow to accommodate new elements. If the internal array is reallocated 
during the v->push_back(42) call, a use-after-lifetime bug occurs on the next 
line as x points into the previous array. Note how the Pulse error message (at 
the bottom of Fig. 1) refers to memory that has been invalidated. As we describe 
shortly, this information is tracked in Pulse with an invalidated heap assertion. 
For the theory in this paper, we do not want to descend into the details of 
C++, vectors, and so forth. Thus, for illustrative purposes, in Fig.2 we present 
an adaptation of such use-after-lifetime bugs in C rather than C++, alongside its 
representation in the ISL language used in this paper. In this adaptation, the 
array at v is of size 1, and is reallocated in push. back non-deterministically to 
model its dynamic reallocation when growing. We next demonstrate how we can 
use ISL to detect the use-after-lifetime bug in the client procedure in Fig. 2. 


ISL Triples. The ISL theory uses under-approximate triples [35] of the form 
[presumption] C [e : result], interpreted as: the result assertion describes a subset 
of the states that can be reached from the presumption assertion by executing C, 
where € denotes an exit condition indicating either normal or exceptional (erro- 
neous) termination. The under-approximate triples can be equivalently inter- 
preted as: every state in result can be obtained by executing C on a starting 
state in presumption. By contrast, given a Hoare triple {pre} C {post}, the post- 
condition post describes a superset of states that are reachable from the precon- 
dition pre, and may include states unreachable from pre. Hoare logic is about 
over-approximation, allowing false positives but not negatives, whereas ISL is 
about under-approximation, allowing false negatives but not positives. 


Bug Specification of client(v). Using ISL, we can specify the use-after- 
lifetime bug in client (v) as follows: 


[v = a * a  —] client(v) [er(Lre): da’. v — a/ xa’ 5 xah | (PB-CLIENT) 
We make several remarks to illustrate the crucial features of ISL: 


e As in standard SL, x denotes the separating conjunction, read “and sepa- 
rately”. It implies, e.g., that v, a’ and a are distinct in the result assertion. 

e The exit condition er(L,,) denotes an erroneous termination: an error state 
is reached at line Lys, where a is dangling (invalidated). 
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push_back(v) & 


void push back(int **v) local z,y in 
1 z:= *; 
if (nondet()) { (assume (z Æ 0); Lw: y:— [v]; 
free(*v); Lr: free(y); 
*v = malloc(sizeof(int)); y:=malloc(); [v]:— y) 
} + (assume (z = 0); skip) 
F 


client (v) & 


void client(v) { local x in 


int* x = *v; z:— [v]; 
push, back(v) ; push back(?); 
E Lz:[r]:— 88 


*x = 88; } 


Fig. 2. The push back example in C (left); and in the ISL language (right). 


e The result is under-approximate: any state satisfying the result assertion can 
be reached from some state satisfying the presumption. 

e The specification is local: it focuses only on memory locations in the 
client(v) footprint (i.e., those touched by client(v)), and ignores other 
locations. 


Let us next consider how we reason symbolically about this bug. Note that 
for the client(v) execution to reach an error at line L,4,, the push. back(v) 
call within it must not cause an error. That is, in contrast to PB-CLIENT, 
we need a specification for push back(v) that describes normal, non-erroneous 
termination. We specify this normal execution with the ok exit condition as 
follows: 


[v — a * a |œ —] push. back (v) [ok: da’. v 5 a’ «a, 5—*ag5] (PB-Ok) 


PB-OK describes the case when push back(v) frees the internal array of v 
at a (denoted by a y^ in the result), and subsequently reallocates it at a’. 
Consequently, as a is invalidated after the push. back(v) call, the instruction 
following the call in client (v) dereferences invalidated memory at Lrg, causing 
an error. 

Note that the result assertion in PB-OK is strictly under-approximate in 
that it is smaller (stronger) than the exact “strongest post". Given the asser- 
tion in the presumption, the strongest post must also consider the else clause 
of the conditional, when nondet() returns zero and push back(v) does noth- 
ing. That is, the strongest post is the disjunction of the given result and the 
presumption. The ability to go below the strongest post soundly is a hallmark 
of under-approximate reasoning: it allows for compromise in an analyzer, where 
we might choose, e.g., to limit the number of paths explored for efficiency rea- 
sons, or to concretize an assertion partially when symbolic reasoning becomes 
difficult [35]. 

We present proof outlines for PB-Ok and PB-CLIENT in Fig. 3, where we 
annotate each step with a proof rule to connect to the ISL theory in Sect. 3. For 
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legibility, uses of the FRAME rule are omitted as it is used in almost every step, 
and the consequence rule CONS is usually omitted when rewriting a formula 
to an equivalent one. For the moment, we encourage the reader to attempt to 
follow, prior to formalization, by mentally executing the program instructions 
on the assertions and asking: does the assertion at each program point under- 
approximate the states that can be obtained from the prior state? Note that 
each step updates assertions in-place, just as concrete execution does on concrete 
memory. For example, Ly: free(y) replaces a +> — with a y> . In-place reasoning 
is a capability that the separating conjunction brings to symbolic execution; 
formally, this in-place aspect is achieved in the logic by applying the frame rule. 


3 Incorrectness Separation Logic (ISL) 


As a first attempt, it is tempting to obtain ISL straightforwardly by composing 
the standard semantics of SL [41] and the semantics of incorrectness logic [35]. 
Interestingly, this simplistic approach does not work. To see this, consider the 
following axiom for freeing memory, adapted from the corresponding SL axiom: 


[r ++ —] free (x) [ok: emp ^ 1oc(x)] 


Here, emp describes the empty heap and 1oc(x) states that x is an addressable 
location; e.g., x cannot be null. Note that this ISL triple is valid in that any 
state satisfying the result assertion can be obtained from one satisfying the 
presumption assertion, and thus we do have a true under-approximate triple. 
However, in SL one can arbitrarily extend the state using the frame rule: 


F [p] C [e:q] ^ mod(C)nfv(r) = 0 
F [p«r] € [eigar] 


(FRAME) 


Intuitively, the state described by the frame assertion r lies outside the footprint 
of C and thus remains unchanged when executing C. However, if we do this with 
the free(x) axiom above, choosing x ++ — as our frame, we run into a problem: 


|r — — * x | —] free (x) [ok: (emp ^ 1oc(z)) * z => —] 


Here, the presumption is inconsistent but the result is not, and thus there is no 
way to get back to the presumption from the result; i.e., the triple is invalid. In 
over-approximate reasoning this does not cause a problem since an inconsistent 
precondition renders an over-approximate triple vacuously valid. By contrast, an 
inconsistent presumption does not validate under-approximate reasoning. 

Our way out of this conundrum is to consider a modified model in which 
the knowledge that a location was previously freed is a resource-oriented fact, 
using negative heap assertions. The negative heap assertion z y^ conveys more 
knowledge than the loc(z) assertion. Specifically, x y^ conveys: 1) the knowledge 
that x is an addressable location; 2) the knowledge that x has been deallocated; 
and 3) the ownership of location x. In other words, x > is analogous to the 
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[vm axam-] 
local y,z in 


z:=*; // HAVOC 


[ok:z=1 xvm axam-] 
( assume (z # 0); // ASSUME 


ok:z=1 x zZÜ*v-as*ac-] 


[v= axam-] 
local x in 
x:= [v]; // LoAD 


Lm: y:— [v]; // LOAD 


ok:x=axvm axan -] 
ok:z=1 x y=a xvm axa m -] 
push_back(v); // PB-OK 
Lr: free(y); // FREE E i3 
ok:da'.z—a x v a * a —*a b ]// Cons 
ok:z=1 x y=a xvm axa] ub pro 
ok:da'.z—a x vea «a e —*m v] 
y:— mallocO;//ALLOCl, CHOICE 
Lys: [x] :— 88; // STOREER 
ok:z=1 xv= axa xy ] ME Ai di 
er(Li;) : da. z-a*v e a *a H Th ] 
v]:— y; // STORE 
// LOCAL 
ok:z-l*ve y*ays*ye ] 213 "P 
[er(Lrz): da’. v: >a xa => — * a y ] 


) + (...) // CHOICE 
[ok:z=1 * ve y * a *ye-] 
// LOCAL 


[ok: da. v 5 a xa’ 5 — «ah ] 


Fig. 3. The proof sketches of PB-OK (left) and PB-CLIENT (right). 


points-to assertion x ++ — and is thus manipulated similarly, taking up space in 
*-conjuncts. That is, we cannot consistently *-conjoin x y^ either with x — — 
or with itself: x — — * x y «false and z y^ «x14 $ false. 

With such negative assertions, we can specify free() as the FREE axiom in 
Fig. 5. Note that this allows us to recover the frame rule: when we frame xz — — 
on both sides, we obtain the inconsistent assertion x — — * z y> (i.e., false) in 
the result, which always makes an under-approximate triple vacuously valid. 

We demonstrated how we arrived at negative heaps as a theoretical solution 
to recover the frame rule. However, negative heaps are more than a technical 
curiosity. In particular, a similar idea was informally present in Infer and has 
been used formally to reason about JavaScript [21]. Moreover, as we show in 
Sect. 4, negative heaps give rise to a footprint theorem (see Theorem 2). 

Negative heap assertions were previously used informally in Infer. They 
were also independently and formally introduced in a separation logic for 
JavaScript [21] to state that a field is not present in a JavaScript object, which 
is a natural property to express when reasoning about JavaScript. 
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ComMM3C ::= skip | x:=e | z:—* | assume(B) | local x in C | Ci; C2 | C1 + Co | C* 
| z:= allocO | L:free(x) | L:x:= [y] | L: [z]:= y | L: error 


if B then C, else C; ê (assume (B); C1) + (assume (1B); C2) 
while(B) C = (assume (B); C)*; assume (!B) 
assert(B) 4 (assume(!B); error) + assume (B) 


w:=malloc() $ y:— alloc() + x:= null 


Fig. 4. The ISL Language (above); encoding standard constructs in ISL (below). 


Programming Language. To keep our presentation concise, we employ a sim- 
ple heap-manipulating language as shown in Fig.4. We assume an infinite set 
VAL of values; a finite set VAR of (program) variables; a standard interpreted 
language for expressions, EXP, containing variables and values; and a standard 
interpreted language for Boolean expressions, BEXP. We use v as a metavariable 
for values; x,y,z for program variables; e for expressions; and B for Boolean 
expressions. 

Our language is given by the C grammar and includes the standard constructs 
of skip, assignment (x := e), non-deterministic assignment (a := *, where * 
denotes a non-deterministically picked value), assume statements (assume (B)), 
scoped variable declaration (local x in C), sequential composition (C1;C2), 
non-deterministic choice (C4 + C4) and loops (C*), as well as error statements 
(error) and heap-manipulating instructions. Note that deterministic choice 
and loops (e.g.,if and while statements) can be encoded using their non- 
deterministic counterparts and assume statements, as shown in Fig. 4. 

'To better track errors, we annotate instructions that may cause an error with 
a label L € LABEL. When an error is encountered (e.g., in L: error), we report 
the label of the offending instruction (e.g., L). As such, we only consider well- 
formed programs: those with unique labels across their constituent instructions. 
For brevity, we drop the instruction labels when they are immaterial to the 
discussion. 

As is standard practice, we use error statements as test oracles to detect viola- 
tions. In particular, error statements can be used to encode assert statements as 
shown in Fig. 4. Heap-manipulating instructions include allocation, deallocation, 
lookup and mutation. The z:— allocO instruction allocates a new (unused) 
location on the heap and returns it in z, and can be used to represent the 
standard, possibly null-returning malloc() from C as shown in Fig. 4. Dually, 
free(r) deallocates the location denoted by x. Heap lookup z:— [y] reads the 
contents of the location denoted by y and returns it in z; heap mutation [a]:= y 
overwrites the contents of the location denoted by x with y. 


Assertions. The /SL assertion language is given by the grammar below, where 
OGec[-2,Z2,«, ,...]. We use p,q,r as metavariables for assertions. 


AsT29p,q,r:—false| p > q|ida.p|e@e’ classical and Boolean assertions 
|empl]eee'|ev5 |pxq structural assertions 
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As we describe formally in Sect. 4, assertions describe sets of states, where each 
state comprises a (variable) store and a heap. The classical (first-order logic) 
and Boolean assertions are standard. Other classical connectives can be encoded 
using existing ones (e.g., ^p = p => false). Aside from the highlighted z w^ , 
structural assertions are as defined in SL [28], and describe a set of states by 
constraining the shape of the underlying heap. More concretely, emp describes 
states in which the heap is empty; e +> e’ describes states in which the heap 
comprises a single location denoted by e containing the value denoted by e’; and 
p* q describes states in which the heap can be split into two disjoint sub-heaps, 
one satisfying p and the other q. We often write e œ> — as a shorthand for 
jv. e — v. 

As described above, we extend our structural assertions with the negative 
heap assertion e 4 (read “e is invalidated”). As with its positive counterpart 
ete’, the negative assertion e > describes states in which the heap comprises 
a single location at e. However, whilst e — e' states that the location at e 
is allocated (and contains the value e’), e > states that the location at e is 
deallocated. 


ISL Proof Rules (Syntactic ISL Triples). We present the ISL proof rules in 
Fig. 5. As in incorrectness logic [35], the ISL triples are of the form F [p] C [e :q], 
denoting that every state in the result assertion q is reachable from some state 
in the presumption assertion p with exit condition e. That is, for each state og in 
q, there exists c; in p such that executing C on o, terminates with e and yields 
oq. As such, since false describes an empty state set, [p] C [e :false] is vacuously 
valid for all p, C, e. Dually, [false] C [e :q] is always invalid when q # false. 

An exit condition, e € EXIT, may be: 1) ok, denoting a successful execu- 
tion; or 2) er(L), denoting an erroneous execution with the error encountered 
at the L-labeled instruction. Compared to [35], we further annotate our error 
conditions to track the offending instructions. Moreover, whilst [35] rules only 
detect explicit errors caused by error statements, ISL rules additionally allow 
us to track errors caused by memory safety violations, namely “use-after-free” 
violations, where a previously deallocated location is subsequently accessed in 
the program, and “null-pointer-dereference” violations. Although it is straight- 
forward to distinguish between explicit and memory safety errors, for brevity we 
use er(L) for both. 

'Thanks to the separation afforded by ISL assertions, compared to incorrect- 
ness triples in [35], ISL triples are local in that the states described by their 
presumptions only contain the resources needed by the program. For instance, 
as skip requires no resource for successful execution, the presumption of SKIP 
is simply given by emp, which remains unchanged in the result. Similarly, 
assume (D) requires no resource and results in a state satisfying B. The ASSIGN 
rule is analogous to its SL counterpart. Similarly, z:— * in HAVOC assigns a non- 
deterministic value to x. Although these axioms (and ALLOC1, ALLOC2) ask for 
a single equality x = x’ in their presumption, one can derive more general triples 
starting from any presumption p by picking a fresh x’ and applying the axiom, 
and the FRAME and CONS rules on the equivalent presumption x = z' * p[z' /z]. 
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SKIP ASSIGN 


Havoc 


H [emp] skip [ok : emp] | 


ASSUME 
H [emp] assume (B) [ok : B] 


[s] 2: e [okia=elz’ /z]] | 


[ea] e: [ok : x=] 


ERROR 
H [emp] L: error [er(L): emp] 


SEQ1 SEQ2 
F [p] Ci [er(L): q] F [p] C1 [ok: r] F [r] Ca [e:q] Eo r 
F [p] C1; C; [er(L): q] F [p] C1; Ca [e:q] i 
CHOICE EXIST Loop2 
F- [p] Ci [e :q] for some i€ {1, 2} FE[pC[e:g ax ¢ fu(C) F [p] C*;C [e:q] 
F [p] C1 C» [e:q] F iz. p] € [e :3v. q] F [p] C* [e:q] 
CONS DisJ 
pp Fl[p|Cfe:d’] asd F [pi] € [e:ai] F [pz] C [e :a2] 
F [p] C [e:q] F [pi V pa] C [e :q1 V a2] 
SUBST LOCAL 
E [p] C [e:q] | v&fv(».C.a) F [p] € [e:q] 
F [p[y/x]] Cly/z] [e :qly/zx]] H [Ax. p] local z in C [e :3x. q] 
FRAME 
: _ A 1 
E [p] C [e:q] ^ med(C)nfv(r) = 0 à pe Reo NE 
FE [p«r] € [e:q* 7] 
FREE ALLOC2 


F [x — e] L: free Co) [ok: x ^ | H [r2z' * y ] z:= alloc() [ok: x2y * y > —] 


FREENULL 
H [z—nu11] L:free(x) [er(L): z—nu11] 


LoAD STORE 
H [r2z'* y e] L:x:= [y] [ok:x—e[z /z] * y = e[z/z]] Fla elu: [x] :— y [ok:x =y] 
LOADER STOREER 


F [y v^] u:e:= [y] [er(L): v ^] 


LOADNULL 


F [zv] uif] y [er2): 2 v] 


STORENULL 


F [y=null] L:2:— [y] [er(L): y2nui1] H [r—nu11] r: [z] :2 y [er(L): c=nul1] 


Fig. 5. The ISL proof rules where x and z' are distinct variables. 


Note that skip, assignments and assume statements always terminate suc- 
cessfully (with ok). By contrast, L: error always terminates erroneously (with 
er(L)) and requires no resource. The ISL rules SEQ1, SEQ2, CHOICE, LooPl, 
Loop2, Cows, DisJ and SUBST are as in [35]. The SEQ1 rule captures short- 
circuiting when the first statement (C4) encounters an error and thus the pro- 
gram terminates erroneously. Analogously, SEQ2 states that when C, executes 
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successfully, the program terminates with e when the subsequent C» statement 
terminates with e. The CHOICE rule states that the states in q are reachable from 
p when executing C, + Cə if they are reachable from p when executing either 
branch. LooP1 captures immediate exit from the loop; LooP2 states that q is 
reachable from p when executing C* if it is reachable after a non-zero number 
of C iterations. 

The Cons rule allows us to strengthen the result and weaken the presump- 
tion: if q’ is reachable from p’, then the smaller q is reachable from the bigger p. 
Note that compared to SL, the direction of implications in the CONS premise are 
flipped. Using CONS, we can rewrite the premises of DISJ as [pi V p2] C [e :qi] 
and [pi V po] C [e :q2]. As such, if both qı and q2 are reachable from pi V po, 
then qı V q2 is also reachable from p; V p2, as shown in DisJ. The EXIST rule 
is derived from DIsJ; SUBST is standard and allows us to substitute x with a 
fresh variable y; LOCAL is equivalent to that in [35] but uses the Barendregt 
variable convention, renaming variables in formulas instead of in commands to 
avoid clashes. 

As in SL, the crux of ISL reasoning lies in the FRAME rule, allowing one to 
extend the presumption and the result of a triple with disjoint resources in r. 
The fv(r) function returns the set of free variables in r, and mod(C) returns the 
set of (program) variables modified by C (i.e., those on the left-hand of ‘:=’ in 
assignment, lookup and allocation). These definitions are standard and elided. 

Negative assertions allow us to detect memory safety violations when access- 
ing deallocated locations. For instance, FREEER states that attempting to deal- 
locate x causes an error when zx is already deallocated; mutatis mutandis for 
LOADER and STOREER. As shown in ALLOC2, we can use negative assertions 
to allocate a previously-deallocated location: if y is deallocated (y y^ holds in 
the presumption), then it may be reallocated. The FREENULL, LOADNULL and 
STORENULL rules state that accessing x causes an error when z is null. Finally, 
LOAD and STORE describe the successful execution of heap lookup and mutation, 
respectively. 


Remark 1. Note that mutation and deallocation rules in SL are given as {x +> —} 
[z]:— y (x > y) and (x + —} free(x) {emp}; i.e., the value of x is existentially 
quantified in the precondition. We can similarly rewrite the ISL rules as: 


STOREWEAK FREEWEAK 
H [æ > =] [z]:= ylok: x — y] H [ar —] free (x) [ok: x 4 ] 


However, these rules are too weak. For instance, we cannot use STOREWEAK 
to prove |x — 7] [x] := y [ok: x — y]. This is because the implications in the 
premise of the Cons rule are flipped from those in their SL counterpart, and 
thus to use STOREWEAK we must show x — — => z — 7 which we cannot. Put 
differently, STOREWEAK states that for some value v, executing [x] :2 y on a 
state satisfying x + v yields a state satisfying x — y. However, this statement 
is valid for all values of v. As such, we strengthen the presumption of STORE to 
x — e, allowing for an arbitrary (universally quantified) expression e at x. 
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In general, in over-approximate logics (e.g., SL) the aim is to weaken the 
preconditions and strengthen the postconditions of specifications as much as 
possible. This is to ensure that we can optimally apply the Cons rule to adapt 
the specifications to broader contexts. Conversely, in under-approximate logics 
(e.g., ISL) we should strengthen the presumptions and weaken the results as 
much as possible, since the implication directions in the premise of CONS are 
flipped. 


Remark 2. The backward reasoning rules of SL [28] are generally unsound 
for ISL, just as the backward reasoning rules of Hoare logic are unsound 
for incorrectness logic [35]. For instance, the backward axiom for store is 
(ze — x (x > y — p)} [a] := y {p}. However, taking p = emp yields an incon- 
sistent precondition, resulting in the triple {false} [x]:2 y {emp}, which is valid 
in SL but not ISL. 


Proving. PB-Ok and PB-CLIENT. We next return to the proof sketch of 
PB-OK in Fig.3. For brevity, rather than giving full derivations, we follow 
the classical Hoare logic proof outline, annotating each line of the code with 
its presumption and result. We further commentate each proof step and write 
e.g., //CHOICE to denote an application of CHOICE. Note that when applying 
CHOICE, we pick a branch (e.g., the left branch in PB-OK) to execute. Observe 
that unlike in SL where one needs to reason about all branches, in ISL it suf- 
fices to pick and reason about a single branch, and the remaining branches are 
ignored. 

As in Hoare logic proof outlines, we assume that SEQ2 is applied at every step; 
i.e., later instructions are executed only if the earlier ones execute successfully. 
In most steps, we apply FRAME to frame off the unused resource r, carry out 
the instruction effect, and subsequently frame on r. For instance, when verifying 
z:— * in the proof sketch of PB-Ok, we apply Havoc to pick a non-zero value for 
z (in this case 1) after the assignment. As such, since the presumption of HAvoc 
is emp, we use FRAME to frame off the resource v + axa + — in the presumption, 
apply HAvoc to obtain z = 1, and subsequently frame on v + axa  —, yielding 
z = lxv m= asa m —. For brevity, we keep the applications of FRAME and 
SEQ2 implicit and omit them in our annotations. The proof of PB-CLIENT in 
Fig. 3 is then straightforward and applies the PB-OK specification when calling 
push. back(v). We refer the reader to the technical appendix [38] where we apply 
ISL to a further example to detect a null-pointer-dereference bug in OpenSSL. 


4 The ISL Model 


Denotational Semantics. We present the ISL semantics in Fig. 6. The seman- 
tics of a statement C € Comm under an exit condition e € EXIT, written [C]e, 
is described as a relation on program states. A program state, c € STATE, is 
a pair of the form (s, h), comprising a (variable) store s € STORE and a heap 
h € HEAP. 
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[.] : Comm > EXIT — P(STATE x STATE) c € STATE Ê STORE x HEAP 
s € STORE 2 Van SS VAL he Heap ê Loc 2 vang(1) le Loc C VAL 


[skip] ok = ((o, o) | € STATE} [skip]er(—) ê 0 

[z:— e]ok 4 (((s, 8), (ole «(9)],8)) [z= ejer(-) 20 
[v:= *]ok £ {((s, h), (sp = v], h)) | v € Var) [z:= *]Jer(-) £0 
[assume(B)]ok = {(c,0) |o=(s,h) A s(B)40} [assume CB)]er(—) £ ø 


[L: error]ok £ ø [L: error]er(L’) = ((o,0) | riz) 


. ^ lez ok (0,0) € [Ci]e 
[C1; Calle = les V do". (c, a") € [Ci]ok ^ (o,o) € Ri 


[local x in C]e ={((s[x = v], h), (s [x = v], h^)) | ((s,h), (s, ^^) € [C]e) 
[Ci + Cale £ [Cille U [Cale 
[C']e *U,en [C']e with C?2skip and C'"'5c;c 


o=(s,h) ^ v € VAL 
^ (LZ dom(h) v NM 


[z:— a11ocO]ok + (e (s[x > l], h|l 5 v])) 


|x := alloc()]er(— 


[L: free (x) c, (s, h[s(x) + 1])) |o=(s,h) ^ h(s(x)) € Var) 
^ (s(z)—nul1V h(s(x))—.L 
s, h) ^ h(s(y))2v € VAL} 

^ (s(y) emu v A(s(y))-1)) 
| o=(s,h) ^ h(s(x)) € VAL} 
^ 


(s(z)mu1 v A(s(2))=1)} 
(emp) = {(s,h) | dom(h)=0} (e+ e') = ((s, h) | dom(h)={s(e)} ^ h(s(e))=s(e’)AL} 


(es {(s, h) | dom(h)={s(e)} ^ h(s(e))=L} (p*q) Ê {op © Oq | Op € (p) ^oq € (a) } 


)4 

NT 

[L: £ree (z)] er (1) = ((o, 0) | 12i ^ o=(s, h) 

[L: 2:2 [y] ok = ( (ø, (s[x v], h))|o =( 
J £ {(0, 0) | rev ae h) 
ok & { (a, ( s,h[s(x) > s(y )D) 
1) &(( (s, h) 


— 
—— 


L:z:— [y]]er(r 


[:[z]:— ylo 


L: [a] := y]er 


0,0 Mie L'Ao= 


(s1, hi W h2) if s1=s2 ^ dom(hi) 1 dom(h2)=0 


h h sh 
where (s1, hi) € (s2, he) = eee otherwise 


Fig. 6. The ISL denotational semantics (top); the ISL assertion semantics (bottom). 


A store is a function from variables to values. Given a store s, expression e 
and Boolean expression B, we write s(e) and s(B) for the values to which e and 
B evaluate under s, respectively. These definitions are standard and omitted. 

A heap is a partial function from locations, Loc, to VAL J {L}. We model 
heaps as partial functions as they may grow gradually by allocating additional 
locations. We use the designated value L ¢ VAL to track those locations that 
have been deallocated. That is, given | € Loc, if h(l) € VAL then J is allocated 
in ^ and holds value (I); and if h(l) = L then l has been deallocated. As we 
demonstrate shortly, we use | to model invalidated assertions such as z ^ . 
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The semantics in Fig. 6 closely corresponds to ISL rules in Fig. 5. For instance, 
[x := [y]]ok underpins Loap, while [x := [y]]er(—) underpins LOADER and 
LOADNULL; e.g., if the location at y is deallocated (h(s(y)) —.L), then executing 
x:= [y] terminates erroneously as captured by [z:— [y]]er(—). The semantics of 
mutation, allocation and deallocation are defined analogously. As shown, skip, 
assignment and assume (B) never terminate erroneously (e.g., [skip] er(—) — 0), 
and the semantics of their successful execution is standard. The two disjuncts 
in [C;; Cale capture SEQ1 and SEQ2, respectively. The semantics of C; + C; is 
defined as the union of those of its two branches. The semantics of C* is defined 
as the union of the semantics of zero or more C iterations. 


Heap Monotonicity. Note that for all C, e and (o5,c4) € [C]e, the (domain 
of the) underlying heap in op monotonically grows from oy to og and never 
shrinks. In particular, whilst the heap domain grows via allocation, all other 
base cases (including deallocation) leave the domain of the heap (i.e., the heap 
size) unchanged — deallocation merely updates the value of the given location in 
the heap to L and thus does not alter the heap domain. This is in contrast to the 
original SL model [28], where deallocation removes the given location from the 
heap, and thus the underlying heap may grow or shrink. As we discuss shortly, 
this monotonicity is the key reason why our model supports a footprint theorem. 


ISL Assertion Semantics. The semantics of ISL assertions is given at the 
bottom of Fig.6 via the function (.) : Ast — (STATE), interpreting each 
assertion as a set of states. The semantics of classical and Boolean assertions are 
standard and omitted. As described in Sect. 3, emp describes states in which the 
heap is empty; and e — e’ describes states of the form (s, h) in which h contains 
a single location at s(e) with value s(e’). Analogously, e > describes states of 
the form (s, h) in which ^ contains a single deallocated location at s(e). Finally, 
the interpretation of p q contains a state c iff it can be split into two parts, 
O = Op * Oq, such that op and c, are included in the interpretations of p and 
q, respectively. The function e : STATE x STATE — STATE given at the bottom 
of Fig. 6 denotes state composition, and is defined when the constituent stores 
agree and the heaps are disjoint. For brevity, we often write c € p for c € (p). 


Semantic Incorrectness Triples. We next present the formal interpretation 
of ISL triples. Recall from Sect. 3 that an ISL triple [p] C [e : q] states that every 
state in q is reachable from some state in p under e. Put formally: 


= [p] € [e:q] 4, Voq € q. do, € p. (05,04) € [C]e 


Finally, in the following theorem we show that the ISL proof rules are sound: if 
a triple F [p] C [e : q] is derivable using the rules in Fig. 5, then E- [p] C [e : q] holds. 


Theorem 1 (Soundness). For all p,C,e,q, if - [p] C [e:q], then E [p] C 
[c :q]. 
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4.1 The Footprint Theorem 


The frame rule of SL enables local reasoning about a command C by concen- 
trating only on the parts of the memory that are accessed by C, i.e., the C 
footprint: 


‘To understand how a program works, it should be possible for reasoning and 
specification to be confined to the cells that the program actually accesses. 
The value of any other cell will automatically remain unchanged.’ [36] 


Local reasoning is then enabled by semantic observations about the local effect 
of heap accesses. In what follows we describe some of the semantic structure 
underpinning under-approximate local reasoning, including how it differs from 
the classic over-approximate theory. Our main result is a footprint theorem, 
stating that the meaning of a command C is determined by its action on the 
“small” part of the memory accessed by C (i.e., the C footprint). The overall 
meaning of C can then be obtained by “fleshing out" its footprint. 
To see this, consider the following example: 


1. free (3); 
2. Lo: free(y) + free(z); (FOOT) 
3. La: free(r) + skip 


For simplicity, let us ignore variable stores for the moment and consider the 
executions of FOOT from an initial heap h = [ls > 1,1, — 2,1, > 3], containing 
locations lz, ly and /;, corresponding to variables x, y and z, respectively. Note 
that starting from h, FOOT gives rise to four executions depending on the + 
branches taken at lines 2 and 3. Let us consider the successful execution from 
h that first frees y, then frees x (the right branch of + on line 2), and finally 
executes skip (the right branch of 4- on line 3). The footprint of this execution 
from h is then given by (ok : [lz — 1,ly — 2], le — L, ly  1]), denoting an ok 
execution from the initial sub-heap |I; — 1,l, 2], yielding the final sub-heap 
[l; ++ L,l, — L] upon termination. That is, the initial and final sub-heaps in 
the footprint do not include the untouched location l, as it remains unchanged, 
and the overall effect of FOOT is obtained from its footprint by adding l; + 3 to 
both the initial and final sub-heaps; i.e., by “fleshing out” the footprint. 

Next, consider the execution in which the left branch of 4- on line 2 is taken, 
resulting in a use-after free error. The footprint of this second execution from h 
is given by (er(La) : [ly — 2], [l, — L]), denoting an error at L2. Note that as this 
execution terminates erroneously at L5, unlike in the first execution, location ly 
remains untouched by FOOT and is thus not included in the footprint. 

Put formally, let foot (.) : Comm — EXIT — P(STATE x STATE) denote 
a footprint function such that foot(C)e describes the minimal state needed 
for some C execution under e: if ((s, h), (s’,h’)) € foot (C) e, then A contains 
only the locations accessed by some C execution, yielding h’ on termination. 
In Fig. 7 we present an excerpt of foot (.), with its full definition given in [38]. 
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foot (C; + C3) e foot (C1) € U foot (C2) € 
foot (L: free(x)) ok = {((s, L> v]), (s, L = 1])) | s(z)=1 ^ v € Var) 
foot (L: free(x)) er(L’) = {((s, [J 1), (s, [J 1])) | iv^ s(z)=1} 
U {((s, ho), (s, ho)) | L=L'A s(x)=nul1} 


Fig. 7. The foot (.) function (excerpt), where ho denotes an empty heap (dom(ho) = Ø). 


Our footprint theorem (Theorem 2) then states that any pair (05,04) resulting 
from executing C (i.e., (c5,04) € [C]e) can be obtained by fleshing out a pair 
(05, 0) in the C footprint (i.e., (05,07) € foot (C) €): (op, oq) = (c5 07,0, 0.) 
for some Gp. 

Theorem 2 (Footprints). For all C and e: [C]e = frame (foot (C) e), where 
frame (R) = {(ap © op, oq © Or) | (05,04) € R}. 


We note that our footprint theorem is a positive by-product of the ISL model 
and not the ISL logic. That is, the footprint theorem is an added bonus of the 
heap monotonicity in the ISL model, brought about by negative heap resources, 
and is orthogonal to the notion of under-approximation. As such, the footprint 
theorem would be analogously valid in the original SL model, were we to alter 
its model to achieve heap monotonicity through negative heaps. That said, there 
are important differences with the classic SL theory, which we discuss next. 


4.2 Differences with the Classic (Over-Approximate) Theory 


Existing work [14,40] presents footprint theorems for classical SL based on the 
notion of safe states; i.e., those that do not lead to erroneous executions. This is 
understandable as the informal reasoning which led to the frame rule for SL was 
based on safety [36,45]. According to the fault-avoiding interpretation of an SL 
triple (p) C {q}, deemed invalid when a state in p leads to an error, if C accesses 
a location outside p, then this leads to a safety violation. As such, any location 
not guaranteed to exist in p must remain unchanged, thereby yielding the frame 
rule. The existing footprint theorems were for safe states only. 

By contrast, our theorem considers footprints involving both unsafe and safe 
states. For instance, given the FOOT program and an initial state (e.g., h in 
Sect. 4.1), we distinguished a footprint leading to an erroneous execution (e.g., 
(er(L2) : [ly — 2], [ly — L])) from one leading to a safe execution (e.g., (ok : 
[lr — 1,l, = 2], [l; = L, ly  1])). This distinction is important, as otherwise 
we could not distinguish further bugs that follow a safe execution. To see this, 
consider a second error in FOOT, namely the possible use-after-free of x on line 
3, following a successful execution of lines 1 and 2. 

For reasoning about incorrectness, it is essential that we consider unsafe 
states when accounting for why things work; this is a technical difference with 
the classic footprint results. But it also points to a deeper conceptual difference 
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between the correctness and incorrectness theories. Above, we explained how 
safety, and its violation, played a crucial role in justifying the frame rule of over- 
approximate SL. However, as we describe below, ISL and its frame rule do not 
rely on safety. 

As shown in [35], an under-approximate triple can be equivalently defined 


as: [p] C [e : q] a4 post(C, p) 2 q, where post(C, p) describes the states obtained 
by executing C on p. While this under-approximate definition equivalently jus- 
tifies the frame rule, the analogous over-approximate (Hoare) triple obtained by 


flipping 2 (ie., {p} C (q) as post(C, p) C q) invalidates the frame rule: 


{true} [z] :— 23{true} 
{a> 17 true) [x]: 23(x > 17 * true} 


(FRAME) 


'The premise of this derivation is valid according to the standard interpretation 
of over-approximate triples, but its conclusion (obtained by framing on x +> 17) 
certainly is not, as it states that the value of x remains unchanged after mutation. 

The frame rule is then recovered by strengthening the (p) C {q} interpre- 
tation, either by requiring that executing C on p not fault (fault avoidance), 
or by “baking in” frame preservation: Vr. post(C, p * r) C q xr. Both solutions 
then invalidate the premise of the above derivation. We found it remarkable 
that our ISL theory is consistent with the technically simpler interpretation of 
triples - namely as post(C,p) 2 q, the dual of Hoare's interpretation — and 
that it supports a simple footprint theorem at once, again in contrast to the 
over-approximate theory. 


5  Begin-Anywhere, Intra-procedural Symbolic Execution 


ISL lends itself naturally to the definition of forward symbolic execution analy- 
ses. We demonstrate that using the ISL rules, it is straightforward to derive 
a begin-anywhere, intra-procedural analysis that allows us to infer valid ISL 
triples automatically for a given piece of code, with the goal of finding only 
true bugs reachable from an initial state. This is implemented in the intra- 
procedural-only mode of the Pulse analysis inside Infer [18] (accessible by pass- 
ing --pulse --pulse-intraprocedural-only to infer). The analysis follows 
principles from bi-abduction [11], but takes its most successful application — 
bug catching [18] — as the sole objective. This allows us to make a number of 
adjustments and to obtain an analysis that is a much closer fit to the ISL theory 
of under-approximation than the original bi-abduction analysis was to the SL 
theory of over-approximation. 

The original bi-abduction analysis in Abductor [11] and Infer [18] aimed at 
discovering fault-avoiding specifications for procedures. Ideally, one would find 
specifications for all procedures in the codebase, all the way to an entry-point 
(e.g., the main O function), thus proving the program safe. In practice, however, 
virtually all sizable codebases have bugs, and known abstract domains are impre- 
cise when proving memory safety for large codebases. As such, specifications were 
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p,q:—emplece'|ee |e |pxq Symbolic Heaps 


SE-SEQ 

[po] Co [ok: qo] C1 ^» [pi] Co; Ca [e1 : qi] 
[pi] Co; Ci [e :qi C5 ~ [p2] Co; Ci; C2 [e2 :q2] 
[po] Co [ok : qo] C1; Ca ^» [p2] Co; C1; C2 [e2 :q2] 


SE-CHOICE 
[po] Co [ok : qo] Ci ~> [pi] Co; Ci [ei : gi] 
[po] Co [ok : qo] C1 + C2 ~> [pi] Co; C1 + Ca [es :q:] 


SE-STORE 
q*MdzoeesF mod(C) n fv(M) = 0 


Ip C[ok: q] [e]:= y ~ [p * M] C; [s]: y [ok: z > y * F] 


SE-STOREER 
qF x xtrue or q F z = null «true 


[p] C [ok: q] L: [z]:= y ^» [p] Civ: [x] := y [er(L): q] 


Fig. 8. Symbolic heaps (above) and selected symbolic execution rules (below). 


found for only 40-70% of the procedures in the experiments of [11]. Nonetheless, 
proof failures, a by-product of proof search, became practically more valuable 
than proofs, as they can indicate errors. Complex heuristics came into play to 
classify proof failures and to report to the programmer those more likely to be 
errors. These heuristics have not been given a formal footing, contributing to 
the gap between the theory of proofs and the practice of bug catching. 

Pulse approaches bug reporting more directly: by looking for them. It infers 
under-approximate specifications, while recording invalidated addresses. If such 
an address is later accessed, a bug is reported soundly, in line with the theory. 


Symbolic Execution. In Fig.8 we present our symbolic execution as big-step, 
syntax-directed inference rules of the form [po] Co [eo : go] C ^» [p] Co; € [e:q], 
which can be read as: “having already executed Cy yielding (discovering) the 
presumption po and the result qo, then executing C yields the presumption p 
and result q”. As is standard in SL-based tools [4,11], our abstract states consist 
of *-conjoined predicates, with the notable addition of the invalidated assertion 
and omission of inductive predicates. The latter are not needed because we never 
perform the over-approximation steps that would introduce them. 

SE-SEQ describes how the symbolic execution goes forward step by step. 
SE-CHOICE describes how the analysis computes one specification per path 
taken in the program. To ensure termination, loops are unrolled up to a fixed 
bound Njoops, borrowing from symbolic bounded model checking [12]. These two 
ideas avoid the arduous task of inventing join and widen operators [15]. For added 
efficiency, in practice we also limit the maximum number of paths leading to the 
same program point to a fixed bound Naisjuncis. The Moops and Naisjunctsbounds 
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give us easy “knobs” to tune the precision of the analysis. Note that pruning 
paths by limiting disjuncts is also sound for under-approximate reasoning [35]. 

To analyze a program C, we start from Cy = skip and produce [emp] skip 
[ok: emp] C ~ [p] skip;C [e:q]. As H [emp] skip [ok: emp] holds and symbolic 
execution rules preserve validity, we then obtain valid triples for C by Theorem 3. 


Theorem 3 (Soundness of Symbolic Execution). If = [po] Co [e :qo] and 
[po] Co [co : qo] C ^» [p] Co; C [e :q], then E [p] Co; C [e:q]. 


Symbolic execution of individual commands follows the derived SYMBEXEC 
rule below, with the side-condition that mod(Co)nfv(M) = mod(C)nfv(F) = 0: 


SYMBEXEC 


[po] Co [o&a0] [p] C [e] 
[po* M] Co [oKqo* M] qo * M 4 p*F [p*F] C [eg F] 


[po * M] Co; € [e :q « F] 


If executing Co yields the presumption po and the current state qo, then 
SYMBEXEC allows us to execute the next command C with specification [p] C 
[c :q]. This may 1) materialize a state M that is missing from qo (and is needed 
to execute C); and 2) carry over an unchanged frame F. The unknowns M and 
F in the bi-abduction question p» F F qo * M have analogous counterparts in 
over-approximate bi-abduction; but, as in the CONS rule, their roles have flipped: 
the frame F is abduced, while the missing M is framed (or anti-abduced). 


Bi-abduction and ISL. Bi-abduction is arguably a better fit for ISL than 
SL: in SL adding the missing M to the overall precondition po is only valid for 
straight-line code, and not across control flow branches. Intuitively, there is no 
guarantee that a safe precondition for one path is safe for the other. This is 
especially the case in the presence of non-determinism or over-approximation 
of Boolean conditions, where one cannot find definitive predicates to force the 
analysis down one path. It is thus necessary to re-execute the whole procedure 
on the inferred preconditions, eliminating those that are not safe for all paths. 
By contrast, in our setting SE-CHOICE is sound, and this re-execution is not 
needed! 

We allow the analysis to abduce information only for successful execution; 
erroneous executions have to be manifest and realizable using only the informa- 
tion at hand. We do this by requiring M to be emp in SYMBEXEC when applied 
to error triples. We go even further and require that the implication be in both 
directions, i.e., that the current state force the error — note that if q F x y^ «true 
then there exists F such that x 4 xF F q, and similarly for q F x = null x true. 
This is a practical choice and only one of many ways to decide where to report, 
trying to avoid blaming the code for issues it did not itself cause. For instance, 
thanks to this restriction, we do not report on [x] :— 10 (which has error specifica- 
tions through STOREER and STORENULL) unless a previous instruction actively 
invalidated x. This choice also chimes well with the fact that the analysis can 
start anywhere in a program and give results relevant to the code analyzed. 
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Solving the bi-abduction entailment in SYMBEXEC can be done using the 
techniques developed for SL [11, §3]. We do not detail them here as they are 
straightforwardly adapted to our simpler setting without inductive predicates. 


Finding a Bug in client, Automatically. We now describe how Pulse auto- 
matically finds a proof of the bug in the unnanotated code of client from 
Fig. 3, by automatically applying the only possible symbolic execution rule at 
each step. Starting from emp and going past the first instruction z:— [v] requires 
solving v — ux F F emp * M. The bi-abduction entailment solver then answers 
with F = emp and M = v c u, yielding the inferred presumption v — u 
and the next current state v œ> u x x = u. The next instruction is the call to 
push back(v). For ease of presentation, let us consider this library call as an 
axiomatized instruction that has been given the specification in Fig. 3. This cor- 
responds to writing a model for it in the analyzer, which is actually the case in 
the implementation, although the analysis would work equally well if we were to 
inline the code inside client. Applying SYMBEXEC requires solving the entail- 
ment v axa m w * F Hv ux*g = uxM. The solver then answers with the 
solution F = (x = u*a = u) and M = u w. Finally, the following instance of 
SE-StoreEr is used to report an error, where C = skip; x := [v]; push-back (v) 
and qre = V= d' xd => w*ay» xr=u*a = u: 


[v = u x um w] C lok: qre] Lra: [x]:= 88 


^v e u * uc w] Cina: [£]:= 88 [er(Lre): qra] 


Preliminary Results. Our analysis handles the examples in this paper, modulo 
function inlining. While our analysis shows how to derive a sound static analysis 
from first principles, it does not yet fully exploit the theory, as it does not handle 
function calls, and in particular summarization. Under-approximate triples pave 
the way toward processors 
do not provide sufficient means to close all leakage, e.g., shared state cannot 
be cleaned properly on a context switch [22]. Finally, it has been shown that 
fixes relying on too specific assumptions can be circumvented by modifying the 
attack [43], and that attacks are possible even against formally verified software 
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if the underlying processor model is unsound [28]. For these reasons, validation 
of formal models by directly measuring the hardware is of great importance. 


9 Concluding Remarks 


We presented Scam-V, a framework for automatic validation of observational 
models of side channels. Scam-V uses a novel combination of symbolic execution, 
relational analysis, and observational models to generate experiments. We eval- 
uated Scam-V on the ARM Cortex-A53 processor and we invalidated all models 
of Sect. 2.3, i.e., those with observations that are cache-line-offset-independent. 

Our results are summarized as follows: (i) in case of cache partitioning, the 
attacker can discover victim accesses to the other cache partitions due to the 
automatic data prefetcher; (ii) the Cortex-A53 prefetcher seems to respect 4K 
page boundaries, like in some Intel processors; (iii) a mechanism of Cortex-A53, 
which we called previction, can leak the time between accesses to the same cache 
set; (iv) the cache state is affected by the cache line offset of the accesses, prob- 
ably due to undocumented cache bank collisions like in some AMD processors; 
(v) the formal ARMv8 model had a flaw in the implementation of CBNZ; (vi) 
our implementation of the observational model had a flaw in case of loads into 
the constant zero register. Moreover, since the microarchitectural features that 
lead to these findings are also available on other ARMv8 cores, including some 
that are affected by Spectre (e.g. Cortex A57), it is likely that similar behaviors 
can be observed on these cores, and that more powerful observational models, 
including those that take into account Spectre-like effects, may also be unsound. 

These promising results show that Scam-V can support the identification of 
undocumented and security-relevant features of processors (like results (ii), (iii), 
and (iv)) and discover problems in the formal models (like results (v) and (vi)). 
In addition, users can drive test-case generation to conveniently explore classes 
of programs that they suspect would lead to side-channel leakage (like in result 
(i)). This process is enabled by path and term enumeration techniques as well 
as custom program generators. Moreover, Scam-V can aid vendors to validate 
implementations with respect to desired side-channel specifications. 

Given the lack of vendor communication regarding security-relevant proces- 
sor features, validation of abstract side-channel models is of critical importance. 
As a future direction of work, we are planning to extend Scam-V for other archi- 
tectures (e.g. ARM Cortex-M0 based microcontrollers), noisy side channels (e.g. 
time and power consumption), and other side channels (e.g. cache replacement 
state). Moreover, we are investigating approaches to automatically repair an 
unsound observational model starting from the counterexamples, e.g., by adding 
state observations. Finally, the theory in Sect.4 can be used to develop a certi- 
fying tool for verifying observational determinism. 
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Abstract. Geo-replicated systems provide a number of desirable prop- 
erties such as globally low latency, high availability, scalability, and built- 
in fault tolerance. Unfortunately, programming correct applications on 
top of such systems has proven to be very challenging, in large part 
because of the weak consistency guarantees they offer. These complex- 
ities are exacerbated when we try to adapt existing highly-performant 
concurrent libraries developed for shared-memory environments to this 
setting. The use of these libraries, developed with performance and scal- 
ability in mind, is highly desirable. But, identifying a suitable notion of 
correctness to check their validity under a weakly consistent execution 
model has not been well-studied, in large part because it is problem- 
atic to naively transplant criteria such as linearizability that has a useful 
interpretation in a shared-memory context to a distributed one where the 
cost of imposing a (logical) global ordering on all actions is prohibitive. 
In this paper, we tackle these issues by proposing appropriate semantics 
and specifications for highly-concurrent libraries in a weakly-consistent, 
replicated setting. We use these specifications to develop a static analysis 
framework that can automatically detect correctness violations of library 
implementations parameterized with respect to the different consistency 
policies provided by the underlying system. We use our framework to 
analyze the behavior of a number of highly non-trivial library imple- 
mentations of stacks, queues, and exchangers. Our results provide the 
first demonstration that automated correctness checking of concurrent 
libraries in a weakly geo-replicated setting is both feasible and practical. 


1 Introduction 


Geo-replicated systems maintain multiple copies of data at different locations 
and provide a number of attractive properties such as globally uniform low 
access-latency, always-on availability, fault tolerance, and improved scalability. 
Applications with a geo-distributed user base need to necessarily run on top of 
replicated systems to ensure fast and always-available service. On the other hand, 
due to concurrent updates at different replicas and the possibility of arbitrary 
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re-ordering of updates by the underlying network, replicated systems typically 
guarantee a very weak form of consistency called eventual consistency [4], that 
only requires replicas which have received the same set of updates to exhibit 
the same state. Because this guarantee is often too weak to satisfy an appli- 
cation’s correctness requirements, a number of (stronger) consistency policies 
have emerged in recent years; these policies offer session [39], causality [27] or 
transactional [13] guarantees, and constrain system behavior by imposing addi- 
tional synchronization on actions. Nonetheless, writing correct applications in 
this environment using these policies remains a challenging problem. 

Having a library of performant and correct data structure implementations 
developed with replication and geo-distribution in mind can significantly allevi- 
ate the problem of writing correct applications, as demonstrated by the availabil- 
ity of highly popular concurrent library implementations developed for shared- 
memory systems [21,33]. CRDTs [36] (Conflict-Free Replicated Data Types) 
offer an analog of such implementations for geo-replicated environments. How- 
ever, using CRDTS to build useful data structure libraries is challenging because 
the strong requirements imposed by CRDTs (namely that all operations com- 
mute with each other) appears satisfiable only for simple objects such as sets, 
lists, or maps. Important data structures such as stacks, queues, or exchangers 
that serve as building blocks for many concurrent and distributed algorithms 
have eluded implementations using CRDTs. Even when a data structure can 
be expressed in this way, reasoning about its correctness is typically given in 
terms of non-standard criteria such as replicated data type specifications [12], 
convergence [31] or replication-aware linearizability [41], concepts that are likely 
to be difficult for programmers to grasp, especially when contrasted with well- 
established notions such as linearizability used to reason about shared-memory 
concurrency. This state of affairs has made it difficult to seamlessly adapt and 
exploit ongoing progress in the development of scalable and correct concurrent 
algorithms used in the shared-memory world to a geo-replicated setting. 

In order to bridge this gap, we study how to automatically transplant concur- 
rent library implementations developed for shared memory systems to replicated 
ones. Doing so would allow us to use carefully-crafted implementations which 
have been proven to run correctly in shared memory environments, thereby sim- 
plifying the task of building distributed replication-aware applications. How- 
ever, realizing this goal poses a number of challenges, the most critical of which 
is the widely different memory consistency models used in the two domains: 
the eventually consistent memory model typically provided by a replicated sys- 
tem is significantly weaker than the sequential consistency guarantees offered 
by shared-memory. Consistency policies offering session, causal, or transactional 
guarantees must be additionally considered to facilitate correct behavior. This 
requires enriching the semantics of existing library implementations to take into 
account the consistency policy of the underlying replicated system. Furthermore, 
the de facto correctness criterion for concurrent library implementations is lin- 
earizability, which is clearly too restrictive to be directly applied to this much 
weaker setting, since it demands that any correct execution be equivalent to 
some sequential execution of a reference implementation. Such a requirement 
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is problematic in a geo-replicated environment where the cost of coordination 
to enforce a global ordering of all actions is prohibitive. These observations are 
similar to those made by Raad et al. [34] who considered the applicability of lin- 
earizability in a weak memory context, a scenario that faces similar challenges 
to our own. To address these issues, we therefore consider alternative declara- 
tive specifications of data structures, based on axiomatic definitions [17], that 
are roughly equivalent to the guarantees provided by linearizability (and hence 
familiar to programmers), but suitably relaxed to take into account the weak 
behaviors admitted by replicated systems. 

We then propose an automated approach to find bounded violations of these 
declarative specifications given an implementation and a consistency policy. Due 
to the non-deterministic nature of replicated systems, manifesting violations in 
actual executions requires (1) a specific combination of library methods to be 
called (2) with specific argument values and (3) a specific interaction of low-level 
read/write events. Indeed, existing approaches to checking application safety 
under weak consistency [24] potentially involve long (on the order of hours) and 
costly execution runs to offer meaningful assurance on application correctness 
given the large space of possible behaviors that can be exhibited. 

In contrast to testing approaches, our analysis framework directly searches 
for an execution violating a specification, and in the process constructs the com- 
bination of library methods to be called as well as their argument values, and 
the low-level read/writes which can lead to the violation. Moreover, because our 
analysis is parametric in the choice of consistency policy, we can constrain the 
search for violating executions on-demand as per the chosen policy. We addi- 
tionally show how our technique is capable of expressing complex correctness 
specifications of libraries (see Sect. 3.4) and how it can be used to automati- 
cally find violations in the face of this complexity. The analysis is sound in that 
it only reports actual violations. Notably, our experiments manifest a number 
of non-trivial and complex violating executions for realistic concurrent libraries 
which require intricate interaction with library methods. We were also able to 
analyse application behavior under different consistency policies, and in partic- 
ular, were able to find the weakest consistency policy to eliminate a particular 
violation. Our analysis is based on developing an efficient encoding of the imple- 
mentation, the consistency policy, and the correctness specification as first-order 
logic formulae which can be dispatched to off-the-shelf SMT solvers to find viola- 
tions. Unlike random testing approaches, our technique is capable of identifying 
non-trivial subtle safety violations in the order of minutes, making it feasible to 
use not only for finding violations, but also for checking the feasibility of any 
proposed remediations. We make the following major contributions: 


1. We propose a novel operational semantics for replicated systems parameter- 
ized under realistic consistency policies which can be used to describe execu- 
tions of sophisticated concurrent library implementations. 

2. We demonstrate how to adapt existing specification frameworks developed 
for concurrent libraries on shared memory systems to replicated systems with 
minimal changes. 
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3. We describe an automated bounded verification procedure to detect violations 
of such specifications for implementations intended to execute under a given 
consistency policy. 

4. We catalog the results of applying our analysis on a number of well-studied 
implementations including stacks, queues and exchangers, on a commercial 
replicated store (Cassandra), demonstrating empirically that our correctness 
checking procedure is useful in practice. 


The remainder of the paper is organized as follows. In the next section, we pro- 
vide a motivating example to illustrate the challenges of reasoning about concur- 
rent libraries in a weakly-consistent replicated environment. Section 3 formalizes 
the language used to write library implementations and the specifications that 
characterize their intended behavior. Section 4 describes our bounded verifica- 
tion procedure and provides details about how we encode extracted verification 
conditions. Section 5 describes experimental results and presents case studies to 
illustrate the effectiveness of our approach. Related work and conclusions are 
given in Sect. 6. 


2 Illustrative Example 


push(v){ pop(v)f{ 
1: n = New(Node); while (true){ 
2: n.Val = v; 6: t = Top; 
while(true){ if (t == NULL) 
as t = Top; return EMPTY; 
4: n.Next = t; T: v = t.Val; 
5: if (CAS(Top, t, n)) 8: n = t.Next; 
break; 9: if (CAS(Top, t, n)) 
} return v;} 
} } 


Fig. 1. Treiber Stack 


In this section, we illustrate the various issues that arise when running stan- 
dard concurrent library implementations on replicated systems. Figure 1 shows 
the implementation of a Treiber stack, suitably adapted to execute in a replicated 
environment. The Treiber stack provides two methods (push and pop) to clients, 
and stores the elements of the stack in a linked list, with the order of elements 
in the list corresponding to the order in which elements are pushed. Since repli- 
cated stores typically offer a database or a key-value store interface, we store the 
linked list as a table of type Node with columns Val and Next, where each row 
stores a node of the linked list, with Val storing the value and Next storing the id 
of the next node. Top contains the id of the Node row which is current top of the 
stack (Top is initialized with the special value NULL indicating an empty stack). 
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In Fig. 1, variables denoted by lower-case letters are assumed to be stored locally 
and are not replicated. New(Node) returns the id of a new row in the Node table. 
CAS(Top, t, n) is the typical Compare-And-Swap operation which atomically 
compares Top to t, and if it is equal to t then updates it to nt. 

Clients of concurrent libraries 
issue invocations of a data 
structure’s methods, possi- 
bly at different replicas, with 
invocations being grouped 
together into sessions, with 
each session containing invo- 
cations issued by the same 
client. Whenever a method 
is invoked, the underlying Wig. 2. An execution of Treiber Stack on a replicated 
implementation of the method store 
is executed; we assume the 
various reads and writes per- 
formed by the method may possibly be executed at different replicas. All low- 
level operations performed by the same invocation are defined to be in the same 
session (i.e. the session of the parent invocation). Notice that the implementation 
stores data across a number of locations (e.g. Top or a cell in the Node table), each 
of which are operated independently through low-level read/write/CAS opera- 
tions. The replicated store only guarantees eventual consistency, which means 
that the values stored at all locations eventually converge across all replicas. 
However, users expect the behavior of the library to conform to the specifica- 
tion of the stack data structure, regardless of when and how updates propagate 
across replicas. 

Consider the following basic specification (adapted from the AddRem axiom 
in [17]), which simply says that any value returned by a POP operation must 
have been pushed by some PUSH operation in the execution; observe that the 
specification does not allude to any specific system-level issues related to repli- 
cation or weak consistency: 


2:W(L.Val,1) 5:W(Top,L) 
Q Q 


SS 
e 


O O 
6:R(Top):L 7:R(L.Val): 0 


Yy.meth(y) = POP A ret(y) # EMPTY = Jy .meth(y') = PUSH ^ arg(y’) = ret(7) 


Consider the execution shown in Fig. 2 that involves an invocation of PUSH(1) 
and POP from two different replicas. Among the many operations that the imple- 
mentation of PUSH performs, we show only two write operations in the figure 
(along with line numbers referring to the implementation in Fig. 1), namely the 
write to the Val field of location L (L is the id of the new Node), and the write 
to Top as a result of the successful CAS. Similarly, for the POP operation, we 
show the read to Top, and then the read to the Val field. In the execution, the 
write to Top propagates from replica R1 to R2 before the read, but the write to 


1 CAS operations are typically supported in replicated systems by providing trans- 
actional guarantees to a group of operations; e.g., lightweight transaction support 
provided in Cassandra [26]. 
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Val does not, so that POP sees that a new node has been pushed but does not 
read the value that was actually pushed, instead returning the initial value of the 
location, thus breaking the specification described above. Eventual consistency 
only guarantees that eventually, the write to Val will also be propagated to R2, 
which is not sufficient to guarantee the specification holds under all executions. 

One way to avoid this counterexample would be to ensure that the write to 
Val field by PUSH is propagated to another replica before the write to Top, thus 
guaranteeing that it would be available to the read of Val by POP. Notice that 
the write to Val occurs before the write to Top in the same session, and hence we 
can use session guarantees to ensure the required behavior. In particular, under 
a Monotonic Writes (MW) consistency policy, writes are always propagated in 
their session order to all replicas [1]. However, MW is not sufficient by itself 
to eliminate the counterexample since the reads to Top and Val by POP may 
occur at different replicas, so that the read to Val may occur at a replica in 
which none of the writes by PUSH have propagated. Hence, we also need to have 
these operations execute under a Monotonic Reads (MR) consistency policy that 
mandates all writes witnessed by an operation will also be witnessed by later 
operations in the same session.” 


PUSH(1) PUSH(2) POP : 2 POP : 0 
2: W(L).Val, 1) 3: R(Top) : Li 6: R(Top) : Le 6: R(Top) : Li 
_ 
5: W(Top, L1) 5 : W (Top, L2) 9 : W (Top) : Li 7 : R(Lı.Val): 0 


Fig. 3. A violation of AddRem by Treiber Stack under MW+MR 


Hence, a combination of MW+MR prevents the counterexample in Fig. 2, 
but it is unfortunately not enough to guarantee the AddRem specification is 
correctly enforced. Consider the execution in Fig. 3 which involves four method 
invocations (2 Pushes and 2 Pops), where each invocation occurs on a different 
replica. Again, we only show some relevant low-level operations performed by 
these invocations, with arrows from write to read operations showing reads-from 
(rf) dependencies. In the execution, after the two pushes, 2 is stored on the top 
of stack at Node Lj. Thus, the first Pop operation returns 2 and sets the Top 
to point at Lı, which is then read by the second Pop. However, MW+MR only 
guarantees that all write operations performed by the first Pop will be witnessed 
by the second Pop. Hence, just like in Fig. 2, the second Pop operation may see 
the node at location Lı but not the write to the Val field (which was performed 
by PUSH(1)), resulting in violation of the specification. To avoid this, it must be 
guaranteed that the write to L;.Val by Push(1) must be visible to its read by 
the second Pop (depicted by the two boxes in Fig.3). This can be guaranteed 
by the Write Follows Read (WFR) policy, which analogously to MW, ensures 


2 We formalize all consistency policies used in the paper in the next section. 
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that writes witnessed in a session are propagated to all replicas before writes of 
the session itself (as opposed to MW which only ensures that writes performed 
in a session are propagated in session order). We note that both the violations 
described above (along with their repairs) were automatically discovered using 
our proposed methods, which devised solutions significantly less expensive than 
imposing strong consistency (aka global coordination) on all accesses. 

While MW+MR+WER is required to ensure AddRem in a Treiber Stack, we 
found that weaker consistency policies (including Eventual Consistency) were 
sufficient for other properties and benchmarks (more details are provided in 
Sect. 5). 


3 Semantics and Specifications 


In this section, we define a simple language to write library implementations 
that is nonetheless powerful enough to express a number of real-world imple- 
mentations. We then define an operational semantics to express executions of 
any implementation written in this language on top of a replicated store. A key 
feature of this operational semantics is that it is parametric in the consistency 
policies available to the store. Thus, instantiating the semantics with different 
consistency policy definitions allows us to reason about library behavior under 
replicated stores providing different consistency guarantees. Another important 
feature of the semantics is that it abstracts out low-level operational details such 
as the number of replicas, the specific manifestation of how message sends and 
receives are implemented, etc., and instead uses a succinct representation involv- 
ing read and write events (and various binary relations among them) to capture 
salient characteristics sufficient to reason about library correctness with respect 
to consistency properties. The proposed semantics facilitates a bounded verifi- 
cation approach that is parametric in the consistency policy, and also matches 
very well with existing axiomatic approaches to specify correctness of library 
implementations in shared memory systems. 

First, we define a simple imperative language in which implementations can 
be written: 


v € LocalVar 1 € Locations nev 
$ E{+ x, /} CSiGs—]>j2)  o€{^V} 
e:=e9e|ļ|v|n 
b:=bob | eOe 
c:=v=e | v=1 | 1=e | If b then c else c 
| ce | while b do c | v = CAS(1, e1, e2) 
| return e | return 


The only difference between standard shared-memory programs and those 
written in the above language is that read and write operations can now be 
performed on either Locations, which are replicated, or local variables which 
are not. As we saw in Sect. 2, replicated Locations can in general refer to 
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any field in any table. Let P be the set of programs (c) generated using the 
above grammar. A library L = (M, T) consists of a set of methods (M) and an 
implementation function I : M — P. For simplicity, we assume that each method 
takes as input one argument. Assume that I(m) contains the free variable a that 
stores the input argument. Let V be the value domain for arguments and return 
values. We designate a special value L € Y for the cases where the argument or 
return value is empty. 

The methods of a library implementation L can be invoked any number of 
times by multiple clients. Invocations from the same client are grouped together 
into sessions, where each session consists of a sequence of method invocations. 
Following standard terminology, given a set of sessions S, an interaction between 
clients and the library is expressed as a history, h : S — (Mx V)*, which simply 
associates a sequence of methods invocations to each session. An execution of 
the history corresponds to executing the library implementation of each method 
in the history on the replicated store. The store constrains the behavior of reads, 
writes and CAS operations to replicated Locations through its consistency 
policy. 

We now formally define the operational semantics of a history on a replicated 
store that is parametric in a consistency policy YW. While the history only asso- 
ciates arguments with method invocations, executing it on the replicated store 
will give rise to an abstract execution, which will also associate return values 
with invocations, and whose correctness we are interested in checking. Given a 
history h, library L, and consistency policy W, we define our semantics in terms 
of a labeled transition system (LTS) 2», L, = (6,€,—), where @ denotes a set 
of states, E denotes a set of events (also used as labels) and >C ® x E x & defines 
a transition relation over states and events. 

Each state in & is specified as a tuple (x, R’, y, c, a). x denotes the replicated 
store state and consists of read/write/update events to Locations and various 
relations among them (described in detail later); h’ : S — (M x V)* denotes 
the continuation of the history, i.e., the remaining history yet to be executed; 
u : S — (LocalVar — V) denotes the local variables map for each session; 
c : S — P denotes the continuation of the current invocation for each session, 
i.e., the implementation of the current invocation for each session that is yet to 
be executed and a denotes the abstract execution. Each event ø € £ is a tuple 
(i, s,a), where i is a unique event-id, s € S is the session from which the event 
originated, and a is the action to the replicated store (either read R(l, n), write 
W(1,n) or update U(i,m,n)). Given an event o = (i,s,a), act(o) denotes the 
action a, loc(a) denotes the location that is the subject of the action. 


3.1 Language Semantics 


To simplify the presentation, we decouple the semantics of the language from 
the semantics of the replicated store. The language is defined via a standard 
imperative semantics except that there are no constraints on reads to replicated 
locations (i.e., we do not mandate a specific replica that is targeted by the 
read), and every operation to a replicated location generates an event. These 
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rules do not concern the replicated store state, and hence are of the form 
(hı, 1,1, 1) Z (h2, H2, C2, @2) (i.e. omitting x from 8). We essentially pick 
any session and then execute the next operation from the current invocation in 
the session, or initiate the next invocation in the session if there is no invocation 
currently running. As an illustration, consider the following rule L-READ: 


c(s)=v=1;c’ ø= (i,s,R(l,n)) freshi 
(W, u, c,a) S (h’,u[s > u(s)lv > nl], cls = c’], a) 


The rule picks the next operation in session s which is a read operation to 
location 1, and generates the read event ø reading value n from 1. It updates 
the local variable v to this value, leaving the yet-to-be-executed history (h’) 
and abstract execution (a) unchanged. Write statements (i.e. 1 = n) generate 
write events (W(I,n)), successful CAS statements (i.e. v = CAS(1, m,n) gen- 
erate update events (U(l,m,n)), and unsuccessful CAS generates read events 
(R(L,m')). The complete set of rules can be found in the technical report [32]. 


3.2 Abstract Execution Semantics 


An abstract execution a = (I,sor) maintains a set of method invocation 
events in I’ and a session order relation sor among these events. Each method 
invocation event y € I is a tuple (i, m,a,r,s) where i is a unique event-id, 
m € M is a method of the library, a,r € V are the method argument and 
return values respectively and s € S is the session from which the method was 
called. We use the notation I° for the subset of I’ which only contains method 
invocation events that originate in session s. The following rule (L-RETURN- 
VAL) describes the generation of a method invocation event, which occurs on 
encountering a return statement during execution, and which is added to the 
abstract execution. 
c(s) = return e;c h'(s)=m(k)-h” [e] =n 
a= (I,sor) y=(i,m,k,n,s) af = (T U {y},sor UT: x {y}) 
(h, u,c,a) — (h'[s = h”], u, cls = €], a’) 


The rule updates the yet-to-be executed history h’ by removing the current 
invocation m(k) (since this invocation has now completed), updates the abstract 
execution a to now include the newly completed invocation, and updates the 
current invocation implementation to empty. Note that [e],,(;) denotes the eval- 
uation of the expression e under the local variable map p(s). When the history 
h’ becomes empty, i.e. there are no more method invocations to be executed, the 
abstract execution becomes complete and would include all method instances 
present in the original history h. Note that this rule does not generate any 
read/write/update event. 


3.3 Replicated Store Semantics 


The replicated store state x = (X, vis, ar, so) consists of the set of replicated store 
events (X) and various relations on X. Events can either be read, write or update 
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events, and depending on the type of event, X is partitioned into Xr, Xw and 
Xu. The visibility relation vis C X x X denotes the events visible to an event 
and is used to determine the output of read events. The arbitration relation 
ar C (Xw U Xy) x (Sw U Xy) provides a total ordering on write or update 
events to the same location. Finally, the session order relation so C X x X 
provides a total ordering on events originating from the same session. All events 
generated by statements in the same method invocation would belong to the 
same session and hence would be related by so. We also define a happens-before 
relation hb = (vis U so)* in the usual way. 

We use ¥ to refer to a consistency policy supported by the store. W is a 
predicate on the store state, which must be maintained at every step of the 
execution. W essentially controls the visibility relation on events based on session 
or happens-before order. The following table illustrates the various consistency 
policies that we consider in our work; all of these policies can be implementation 
without the need for global coordination [1].° (all o; belong to X): 


Table 1. Axiomatic characterization of various weak consistency policies. 


Consistency policy W (2, vis, ar, so) 
Read Your Writes [39] | so 


Monotonic Writes [39] | so(o1, 02) A vis(o2, 03) => vis(o1, 73) 


01,02) => vis(o1, 02) 


Monotonic Reads [39] | vis(o1, 02) A so(o2, 03) => vis(o1, 73) 

Write Follow Read [39] | vis(o1, 72) A so(a2, 03) A vis(o3, 04) => vis(o1, 04) 
Causal Visibility [27] hb(o1, 02) A vis(o2, 03) = vis(o1, 03) 

Causal Consistency [27] | hb(o1, 02) = vis(o1, 02) 


As we saw earlier in Sect. 2, MonotonicWrites enforces the constraint that if 
an event is visible, then all events before it in session order must also be visible. 
MonotonicReads requires that if an event is visible, it will continue to remain 
visible to all operations later in the session. On the other hand, WriteFollowsRead 
enforces that all events visible to a prior event in a session will continue to remain 
visible to other events which witness a later event of the session. 

We use the notation ©” to denote the subset of events pertaining to location 
l, and X° to denote the subset of events of session s. Given a set of events X”, 
MAX;,(’) denotes the maximal events in 5” according to the relation ar which 
write to location 1. Given events a € Xh, d'€ Xy, we define the Reads-From 
relation rf in terms of vis and ar relations as follows: 


rf(o’,0) & vis(o', o) A Yo” € X!.(vis(o", o) Ao" #40) > ar(o”,o')) 
3 Note that the lack of any constraints (i.e. W = true) corresponds to Strong Even- 


tual Consistency [18]. Since we assume SEC, our definition of Causal Consistency 
corresponds to Causal Convergence (CCv) as defined by [8]. 
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The rf relation essentially encodes the ‘last writer wins’ nature of the store, 
whereby the most recent visible write event according to ar becomes the event 
supplying the value available to subsequent reads. The replicated store state 
evolves by the addition of new events. On addition of a write/update event, 
the arbitration order is appropriately modified to ensure that it remains a total 
order on events targeting the same location. In addition, we also ensure causal 
arbitration [11] by enforcing that ar and hb do not disagree with each other. For 
update and read events, the values that these events read depend upon the most 
recent write event to the same location visible to the events, which in turn is 
controlled by the consistency policy. To elaborate, consider the rule R-CAS: 


X'CX o'e MAXL(57) ar C ar’ 
act(o’) = W(l,m) V act(o’) = U(l,-,m) o= (i,s,U(I,m,n)) Yre SE.A(rf(o’, 7) 
ar’ is a total order on Xl, U X} U{o} Wor, 02.>(hb(o1, 02) A ar’ (02, 01)) 
vis’ = vis U X’ x {ao} so! = soU X° x {o} W(X U {oc}, vis’, ar’, so’) 


(2, vis, ar,so) 2+ (X U {ø}, vis’, ar’, so’) 


Here, we want to add a new update event to location l. First, an arbitrary 
subset (4) of events of X is selected. This step essentially corresponds to the 
creation of a new replica on which the events in X’ have been applied. Then, 
we select the most recent write event (o’) from X’ which ensures atomicity of 
the update event (and hence the CAS statement responsible for the update). In 
particular, we require that no other update event must have read from (rf) o’. 
The value written by o’ (i.e. m) would be the read value of the update event. 
vis, so and ar are appropriately updated, and the new store state must satisfy 
the consistency policy W, which in turn will govern the selection of the initial 
subset X”. The formal rules for read and write events can be found in [32]. 

Note that enforcing the above rule would in essence prohibit two CAS oper- 
ations to be executed concurrently, and hence would establish a global ordering 
among the CAS operations. However, unlike in shared memory systems where 
this is sufficient to establish a global ordering among all operations thus ensur- 
ing linearizability, in replicated systems, this does not constrain the behavior of 
other read and write operations (as we saw in Sect. 2, and hence more constraints 
must be enforced through the consistency policy. 

We can now combine the language, abstract execution, and replicated store 
rules to describe transitions of the LTS 2a, Ly, which simply requires the lan- 
guage rules and the replicated store rules to agree on the structure of all repli- 
cated store events: 


(h',u,c,a) S (h", W, a) xx 
(x, K, H, C, a) 5 (xX, h", K, e a) 


(R', p, C, a) — (h, W, gh a’) 
(x, k’, uc, a) > (x, h”, W, c’, 0”) 


Example: Let us revisit the Treiber Stack and in particular the violating exe- 
cution described in Fig. 2. The violating history consists of two sessions, with 
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one session containing the invocation push(1) and another containing pop. The 
execution of push(1), following the language semantics, creates the events o1 
and cz such that act(o,) = W(L.Val,1) and act(o2) = U(Top, NULL, L) which 
are both added to the store state. The execution of pop generates the read event 
to Top, which following the store semantics picks the set ©” = {02}, resulting 
in read event o3 such that act(o3) = R(Top,L). Under EC, the following read 
to L.Val by pop is unconstrained and hence simply picks X” = @, resulting in 
the event o4 such that act(o4) = R(£.Val,0) where 0 is the initial value. This 
results in violation of the AddRem specification. 

Notice that so(o1, 72) and vis(o2, 03). Hence, under MW+MR, while generat- 
ing the read event to L.Val by pop, the store must pick ©” = {01,02} to satisfy 
the axioms of MW-+MR, so that the event must read the value 1, which prevents 
the violation from occurring. 


3.4 Correctness Specification 


Given an abstract execution obtained after executing a history on a replicated 
store under some consistency policy, how do we decide if it correctly obeys the 
semantics of the data structure implemented by the library? Linearization would 
require us to demonstrate a total order on all method invocations which would 
be admissible by a sequential reference implementation of the data structure. 
However, since the consistency model of a replicated system is substantially 
weaker than sequential consistency, it becomes necessary to also weaken correct- 
ness requirements [34,37]. We use the axiomatic specifications of data structure 
correctness as proposed by Emmi et al. [17], which are equivalent to standard 
linearizability, as our basis, and then weaken them systematically to adapt them 
to be useful in a replicated environment. Axiomatic specifications do not require 
a total order to be established on method invocations, do not refer back to a 
reference implementation, and also match the axiomatic, declarative nature of 
the semantics of the replicated store. 

First, we define all abstract executions that can be generated given a library 
implementation, a history and a consistency policy. The initial state of the repli- 
cated store is assumed to be empty, i.e. Xinit = ($, $, $, &). Let he be the empty 
history which associates an empty sequence (e€) of invocations to each session. 
Let cinit be the initial implementation state which simply associates the empty 
program e to each session. 


Definition 1. Given a set of sessions S, a history h, a library implementation 
L and a consistency policy W, the abstract executions generated by Nn Ly are 
defined as : |nr, Læ] = {T | (xinit, h, ($, $), cnit) >* (-, he, T, -)} 


Thus, executing all invocations in the history under a given consistency policy 
and library implementation gives rise to the set of final abstract executions. Due 
to the non-deterministic nature of the semantics, multiple abstract executions 
could be generated. Correctness of an abstract execution is specified in terms of 
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various axioms that it must obey. An implementation is correct under a consis- 
tency policy if for all possible histories, all final abstract executions generated 
by the implementation obey the axioms. 

To illustrate, let us consider the Stack data structure. It has two methods 
M = {Push, Pop}. Given a method invocation event y = (i,m, a,r, s), we assume 
projection functions for all the respective components (e.g., m, a, and r). Further, 
we assume a match predicate relating two method invocation events defined thus: 


match(71, 72) <= m(71) = Push A m(72) = Pop A a(41) = r(42) 


Let EMPTY denote a special value signifying the empty return value (see, e.g. the 
Treiber Stack impl. in Fig. 1). Consider an abstract execution a = (I, sor). We 
define the happens-before relation for method invocations as hbr = (match U 
sor)*. Then, the correctness of œ can be specified in terms of the following 
axioms: 


— AddRem : Vy € I.m(y) = Pop A r(y) # EMPTY > Jy’ € T.match(y', y) 

— Injective : V1, Y2, Y3 E€ P.match(71, Y2) A match(71, 73) > Y2 = %3 

— Empty : V¥1,972,73 E T.m(q1) = Pop A r(71) = EMPTY A m(72) = Push A 
hbr(y2, 71) => Iys € T.match(y2, 73) 

— LIFO—1 : Yy, %2, %3 E T.m(yı) = Push A match(y2, 73) A hb(y2,71) A 
hb(41, 73) = Iy, E Pmatch(71, y4) 

— LIFO — 2 : V91,42,73,74 E P.a(match(41, y4) A match(y2, y3) A hb(y2,71) A 
hb(73, 74) A hb(71, 73) 


These axioms follow from those given in [17], except that instead of using a 
linearization order as done in [17], we use a weaker happens-before hbr order. 
It is also possible to use the even weaker session order sor in place of hbr. We 
have already seen the AddRem axiom in §2. The Injective axiom enforces that 
an element pushed onto the stack is not popped more than once*. The Empty 
axiom says that if a pop invocation (y1) returns EMPTY and if there is a push 
invocation (y2) that happens-before it, then y2 must be matched to another 
pop. This reflects the expected stack-like behavior from the point of view of 
a client who observes these invocations. The LIFO — 1 property specifies that 
if a push invocation y2 happens-before another push invocation y,, with both 
of them happening-before a pop invocation %3, and if yg is matched with 4s, 
then to respect the LIFO order, yı must also be matched (to some y4). LIFO — 2 
complements LIFO — 1 by requiring that y3 cannot happen-before such a y4. The 
specifications for other data structures we have considered, including Queue and 
Exchanger can be found in [32]. 


4 Bounded Verification 


We now present an automated bounded verification procedure capable of gen- 
erating abstract executions that violate data structure correctness specifications 


* Note that we assume all methods are called with distinct arguments. 
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under a given consistency policy. We take advantage of the axiomatic nature of 
both the semantics and specification and reduce the problem to that of checking 
the satisfiability of a collection of formulae in first-order logic (FOL), which can 
be dispatched to an off-the-shelf SMT solver. In particular, our strategy is to 
instantiate a bounded number of invocations (k) without specifying their method 
types, arguments, or session information, and instead leave it upto the solver to 
search efficiently among all histories of length k. 


4.1 Vocabulary 


Given a library L = (M, Impl), we first take each method implementation and 
unroll loops upto a constant bound’, and give a label to each program statement 
that interacts with a replicated location (e.g. see the Treiber Stack impl. in 
Fig. 1). Let L denote this set of labels. 

We use an uninterpreted, finite sort | to represent invocations in the history 
that we wish to construct, and then constrain this sort to contain only the 
distinct elements INV,,...,INVz. In addition, we use uninterpreted sorts E and 
V to represent the set of replicated store events and values that are read or 
written by them. We define the function meth : | — M to associate a method 
type with each invocation. We use an uninterpreted sort S to denote the set of 
sessions involved in the history. The function sess : | — S associates a session 
with each invocation. 

For each method m € M and each program statement labeled n in the imple- 
mentation Impl(m), we define the function Pmn : | — E to associates the event 
generated by the program statement to an invocation. In addition, functions 
arg, ret: | — V associate the argument and return values to each invocation. For 
every local variable v used in a program, function py : | — V denotes the value 
of the local variable in that invocation. The predicate so, : | x | — B denotes the 
session order relation among invocation instances. 

We define functions loc, rval,wval : E — V to associate locations, values 
read and values written by events resp. We use the uninterpreted, finite sort E 
containing elements R, W, U to denote various event types. The function Etype : 
E — E associates the type with each event. Finally, predicates vis, ar,sog, rf : 
ExE — B denote the visibility, arbitration, session order, and read-from relations 
resp. among events. 

For every replicated location, we also instantiate a distinct value referring to 
the location. For example, for the Treiber Stack implementation (Fig. 1), we have 
distinct values for Top and for the Val and Next fields of each New Node generated 
by an invocation. Since the number of invocations is fixed (k), the number of 
such locations to be instantiated can also be pre-determined statically. We also 
define a function Initval : V — V which fixes an initial value for every location, 
and assigns initial values to all locations used in the execution. 


5 Loops are typically only used to busy wait for a successful CAS operation in the 
applications we consider. 
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4.2 Implementation Constraints 


We now describe constraints on the events imposed by the implementation. First, 
note that even though the set of functions {Pm,|m € M, n € L} are defined 
for every invocation, an invocation i will only have a fixed method type meth(i), 
and hence will only generate events corresponding to program statements in the 
implementation of meth(i). We designate a special event L : E and associate it for 
program statements of every other method type using the following constraint: 


Viel Ym €M Yn €L. m £ meth(é) > Pmn(i) = L 


For program statements in the implementation of meth(i), we add constraints for 
every statement based on its type. Note that loops have already been unrolled 
and for every statement labeled n in method m, we collect the conditionals of 
any if statement enclosing the statement and replace any local variable v used 
in those conditionals with the corresponding function p,(i) (for invocation i) to 
obtain the formulae [mn |. To illustrate the constraints added for different types 
of statements, consider the rule for reads: 


Impl(m):n: v=1 
Vi € I. (meth(i) = m A[mnli) > (Etype(Pmn(i)) = R A loc(Pmn(i)) = 1 
Arval(Pmn(t)) = pv(t) 


) 


The rule essentially specifies the constraint for statement labeled n in the 
implementation of method m if it is a read operation. The constraint appropri- 
ately sets the Etype, loc and rval functions of event Pınn(i) for every invocation 
i, if the invocation has a method type of m and the enclosing if conditionals (if 
any) are satisfied. The rules for write and CAS statements are similar (they also 
set the wval function and additionally CAS also checks whether the value read is 
equal to its first argument) and can be found in [32]. In addition, we also relate 
adjacent events of the same invocation with the session order relation soeg. 


4.3 Abstract Execution Constraints 


On encountering a return statement, we record the returned value using the 
following constraint: 


Impl(m) :n: return v 
Vi € l. (meth(z) = mA [mn li) > (ret(i) = p(t) A completed(i)) 


Apart from setting the ret value, we also use another unary predicate 
completed to encode that the invocation has completed and reached the return 
statement. This is needed because we are unrolling loops upto a fixed bound. 
Since we know the last program statement statically, if we encounter this state- 
ment without reaching return for an invocation, then completed will be set to 
false. 
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We also encode the constraint that the session order relation (so) among 
invocations of the same session is a total order. Finally, we also encode that if 
two invocations i; and ig are in session order (so;(iz,i2)), then the last event of 
i; and the first event of ig are in event session order (sog). 


4.4 Replicated Store Constraints 


We must also encode constraints ensuring that the semantics of the replicated 
store are preserved. First, we capture various properties of relations on events, 
viz. vis is anti-symmetric and irreflexive, ar among write events to the same 
location is a total order, vis and so; do not clash with each other, ar does not clash 
with vis and so. All these constraints are implicitly enforced by the semantics 
of the replicated store, so that the state of the store reached after any number 
of execution steps must obey them. 

The various consistency policies in Table 1 can be directly encoded using the 
relations defined in the vocabulary. We now turn to encoding the last-writer-wins 
nature of the data store, which relates the vis and ar relations with the read and 
write values (rval and wval) of the events. 


Vei, e2 € E.rf(e1, e2) => vis(e1, e2) A wval(e1) = rval(e2)A 
Vez € Elele) (vis(es, e2) > e3 = e1 V ar(es, €1)) 


Ver € Er.(Veo € E.nrf(e2, e1)) = rval(e1) = Initval(loc(e1)) 


In the above constraints, we use the notation E\y to indicate only those events 
that write to location |, and Er for read events. The first constraint enforces the 
reads-from event to be the most recent visible event according to the arbitration 
order, and also constrains the read value. The second constraint disallows out- 
of-thin-air reads by enforcing that if there are no rf events, then the value read 
must be the initial value. As an optimization, while encoding this constraint in 
our tool, we enumerate all possible write events to the same location (which 
are guaranteed to be finite since we only have k invocations) in the antecedent, 
instead of the universal quantification used above. 

For CAS operations which generate update events, we encode the constraint 
(as derived from the semantics rule R-CAS) that two update events should not 
read from the same event: 


Ve, e1,e2 € E. Etype(e1) = U A Etype(e2) = U A rf(e, e1) A rf(e, e2) > e1 = e2 


4.5 Specification Constraints 


The axioms of correctness for data structures only use an invocation’s argu- 
ment and return values, and the session order relation among invocations in the 
abstract execution. Thus, they can be directly encoded using our vocabulary. 
Given an axion 0, we encode its negation to find histories which have abstract 
executions that violate the axiom. 
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For example, to find violations of the AddRem axiom, we add the following 
constraint: 


i1 € l. meth(i,) = POP A ret(i1) A EMPTY A Vig € l. smatch(ig, i1) 


where we use the predicate match : | x | — B defined in a similar manner as in 
Sect. 3.4. This completes the entire description of our encoding. 
Our main soundness result can be formalized thus® 


Theorem 1. Given a library implementation L, consistency policy YW and a cor- 
rectness axiom 0, if the collection of formulae described above are satisfiable, then 
there exists a history h and an abstract execution I € [2n L y] which violates 0. 


5 Experimental Evaluation 


Table 2. Consistency policies required for various implementations and specifications. 


Benchmark AddRem Injective Empty[SO] |Empty[HB] FIFO-1/LIFO- FIFO-2/LIFO-2|Max time 
1/Exchange (s) 
2Lock Queue MW+MR MW+MR cc cc MW+MR MW+MR 269 
29 +WFR 
LockFree Queue |MW+MR EC cc CG MW-+MR EC 152 
29 
HW Queue [22] |EC EC RMW MW+MR cc MW-+MR 
+RMW 61 
Treiber Stack MW+MR EC cc cc MW-+MR +WER |EC 245 
40 +WFR 
Elimination Stack |MW-+MR EC cc cc MW-+MR +WER |MW 
20 +WFR 65 
Exchanger [20] |MW EC -NA- -NA- MW -NA- 
40 


We have implemented our bounded verification procedure and applied it to 
a number of library implementations that have been widely-used in the world 
of shared-memory systems. We generate FOL formulae for each implementation 
as described in Sect. 4 and dispatch them to Z3 to determine their satisfiability. 
For queues, we have used the 2LockQueue, LockFree Queue and Herlihy and Wing 
(HW) Queue implementations, while for stacks, we have applied our approach 
on the Treiber and Elimination Stack implementations. The Elimination stack uses 
the exchanger implementation, and so we have also checked the correctness of 
the exchanger. 

Since our analysis takes as input the bound on the number of invocations (k), 
the consistency policy, and the specification, we deploy the system as follows: 
For each implementation and specification pairing, we start with bound k = 2 
and the weakest consistency policy (EC). If we do not find any violation, then we 


6 A Proof Sketch can be found in [32]. 
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increase the bound by 1 and perform the analysis again. On the other hand, if 
we do find a violation, then by Theorem-1, we know that it is guaranteed to be 
an actual violation. We record its structure from the satisfiable model returned 
by Z3, and then increase the consistency policy to the next higher level. We 
continue this process until we exhaust our verification time budget (of 1 hour 
per benchmark implementation). Note that all the consistency policies that we 
consider can be arranged in a lattice [38] whereby the higher one goes up the 
lattice, the consistency policies become stronger, which means they allow only 
a subset of executions that are allowed by policies weaker than them. Our tool 
automatically traverses this lattice to find the weakest consistency policy at 
which no bounded violation is found. 

Table 2 summarizes the results of this process. For each pair of benchmark 
implementation and correctness specification, it shows the weakest consistency 
policy at which we did not find any violations. This means that at every con- 
sistency policy weaker than the one specified in the table, violations were dis- 
covered. For each benchmark, we also note the maximum time needed to find a 
violation for any specification by Z3. Some specifications were discussed in 83.4, 
with Empty[SO] meaning we replace the relation hbr with sor in the specifica- 
tion; the correctness specifications for Queues and Exchangers are given in [32]. 
Across all benchmarks, we found that the longest history which violated any 
specification within the time bound considered consisted of 6 invocations. 

To empirically validate our results, we also executed all the benchmarks at the 
appropriate consistency levels on Cassandra, a real-world replicated data store. 
We configured Cassandra with 3 replicas running on Amazon EC2 instances at 
different physical locations (all on the US East Coast). We randomly generated 
client invocations at all 3 replicas and ran each implementation for 4 h (on 
average 92000 invocations/benchmark). We collected the resulting traces and 
checked the specifications. We did not find any violation of the specifications, 
and surmise that violations, when they do occur, manifest in smaller executions 
that can be systematically checked by our analysis. 

The results yield a number of interesting observations. First and foremost, 
note that even for the same benchmark, different correctness specifications 
require different consistency policies, ranging from the weakest, Eventual Con- 
sistency, (EC) to the strongest, Causal Consistency, (CC). This suggests that 
depending upon the requirements of the clients of the library, there is a trade- 
off between consistency and correctness that can be effectively explored. It has 
long been known that Causal Consistency incurs a performance penalty [3] due 
to expensive dependency tracking, significant metadata storage, and long wait 
times for all causally dependent data to arrive. A number of recent approaches 
[9, 14,28] have looked at improving the performance of Causal Consistency, 
mainly by reducing the amount of dependent data required. Our experiments 
suggest that many important correctness properties of library implementations 
may not require CC, but would work correctly under weaker session guarantees 
or even EC. Note that as we discussed in Sect. 2, MW+MR only require all data 
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to be propagated from the same session, while MW+MR-+WEFR requires data to 
be propagated across the entire causal chain. 

Another interesting observation is that important properties such as Injective 
and FIFO/LIFO only require EC for most benchmarks. We also notice that for the 
same correctness specification, different benchmarks require different consistency 
policies, especially among the various Queue benchmarks. This illustrates that 
clients have flexibility in choosing an implementation, based on the properties 
that they need. For example, an HW queue can satisfy the AddRem specifica- 
tion at the weakest consistency policy (EC), but requires CC for FIFO-1, which 
can be satisfied using just session guarantees by both 2LockQueue and LockFree- 
Queue. No single queue implementation provides all correctness guarantees at 
the weakest consistency level. For stacks, the Elimination Stack and the Treiber 
Stack require the same consistency policies for every specification except LIFO-2, 
for which the Elimination Stack requires MW for the Exchange property of the 
underlying Exchanger to be satisfied. By analyzing violations, we also found that 
both the access pattern of different implementations as well as the semantics of 
the data structure (stack vs. queue) played a major role in determining how and 
if violations occur. 

Note that even though we unroll loops upto a fixed bound, for all benchmarks 
except LockFree Queue, the unrolling factor does not matter because in every 
loop, every iteration except the last only performs read events, and the values 
read are only used in the same iteration. Hence, only the last iteration which 
performs a write/update event is relevant; unrolling the loop once is sufficient. 


push(1) push(3) pop : 0 

5 : U(Top, NULL, Li) 5 : U(Top, L2, Ls) 6 : R(Top, L2) 
push(2) pop: 3 7: R(L2.Val, 0) 

5 : U(Top, Li, L2) 9 : U(Top, Ls, L2) 9 : U(Top, Lo, Li) 
pop: 

6 : R(Top, Li) 


Fig. 4. A violation of LIFO — 1 by Treiber Stack under MW+MR involving 6 invoca- 
tions 


In order to illustrate the complex violations automatically generated by our 
framework, consider the violation of LIFO-1 in the Treiber stack implementa- 
tion under MW-+MR in Fig. 4. Here, invocations in the same column are in the 
same session. Following the notation as used in the specification in Sect. 3.4, 
yı = push(2), y2 = push(1), 73 = pop: 1. As a concrete violation of the specifi- 
cation, y2 happens before 71, but y3 returns the value pushed by yz even though 
Jı is unmatched, thus disobeying the LIFO property. The reason behind this 


270 K. Nagar et al. 


violation is that another pop operation (pop:0) is actually popping the element 
pushed by push(2), but it does not read the value 1 and instead reads the initial 
value 0 (thus also violating AddRem). As a result, the last pop operation in the 
leftmost session sees only the element 1 on the stack. We note that there is no 
violation of smaller length under MW+MR. By upgrading the consistency level 
to MW+MW-+WER, the violation is eliminated. 


6 Related Work and Conclusion 


Verifying applications under weak consistency has received significant attention 
in recent years. A number of efforts [2,19,23,25,38] have looked at the prob- 
lem of verifying arbitrary safety invariants while others have considered verifi- 
cation with respect to distributed database applications and specific high-level 
transactional properties [5—7,10,30,35]. These results are orthogonal to the work 
described here, since neither consider the question of safely migrating performant 
concurrent libraries to a replicated environment. 

More directly related are proposals to deal with the specification and verifica- 
tion of various properties of CRDTs [12,18,31,41,42]. CRDTs also offer a library 
interface to clients and have been implemented for various data structures such 
as set, list, map, etc. They follow a different system model than the library imple- 
mentations that we have considered in our work, and typically do not require 
any form of synchronization. However, this requirement imposes stringent con- 
straints on their design (for example, in an op-based CRDT, all operations have 
to commute with each other). We are not aware of any CRDT-like implementa- 
tion of concurrent data structures such as Queue, Stack and Exchangers that we 
have considered here. 

Prior works [18,31] have also developed automated or semi-automated 
approaches to verify the convergence of CRDTs, an important but fairly low-level 
property that does not shed much insight on the correctness of libraries built 
using them. High-level correctness specifications of CRDTs are either given in 
terms of abstract RDT specifications [12,42] or customized specification frame- 
works such as replication-aware linearizability [41]. Both of these specification 
styles are closer to linearizability, but since direct linearization of all operations 
an execution is not possible in a distributed environment, both approaches allow 
relaxations to help decide a linearization order. These relaxations typically take 
the form of allowing different per-invocation linearizations based on the type of 
the invocation and the visibility relation. This can lead to complicated specifica- 
tions that can be substantially different from their shared-memory counterparts, 
complicating verification. In contrast, our axiomatic style also allows clients of 
the library to know exactly how the relaxations in a replicated environment will 
impact observable behavior. Finally, unlike other prior work, we develop a fully 
automated approach for bounded verification of library implementations. 

There has also been recent interest in specifying and verifying concurrent 
library implementations for shared memory systems [16] and weak memory mod- 
els [15,34]. While the specification style of weak memory models bears some 
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superficial resemblance to that of weak consistency, the underlying system model 
is quite different, and weak consistency models allows relaxed behaviors which 
are not allowed by weak memory models. They also offer more fine-grained con- 
trol than possible under weak memory given their ability to provide session-level 
as well as system-wide consistency guarantees to individual low-level operations. 
[34] proposes axiomatic specifications of libraries using happens-before and pro- 
gram orders. Our specifications, while similar in spirit, are more fine-grained and 
better suited to replicated systems. 

To conclude, we tackle the problem of migrating concurrent library implemen- 
tations from shared-memory systems to replicated, distributed ones. We define a 
sensible semantics for such implementations on a replicated store parametric in 
the consistency policy of the store and describe how to migrate the correctness 
specifications for such libraries with minimal changes. Our verification framework 
automatically finds bounded violations of these specifications. Parametericity of 
consistency policies in the analysis allows us to find the weakest policy that 
eliminates a discovered violation. Our experiments have demonstrated that the 
proposed framework is effective in finding non-trivial violations in a number of 
challenging and diverse benchmarks. We also find that the spectrum of weak 
consistency policies in replicated systems can be effectively explored to tradeoff 
correctness and performance. 


Acknowledgments. We thank the anonymous reviewers for their insightful com- 
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Abstract. This paper presents a foundation for refining concurrent pro- 
grams with structured control flow. The verification problem is decom- 
posed into subproblems that aid interactive program development, proof 
reuse, and automation. The formalization in this paper is the basis of a 
new design and implementation of the CIVL verifier. 


1 Introduction 


We present a solution to the problem of proving that no execution of a concurrent 
program leads to a failure. This problem is equivalent to proving an arbitrary 
safety property on the program. In deductive verification, a proof system decom- 
poses this verification problem into a set of proof obligations (or verification 
conditions), and discharging these obligations implies the correctness of the pro- 
gram. At its core, any proof system depends on inductive invariants, and, in 
general, these have to be supplied manually. Inventing an inductive invariant is 
especially challenging for concurrent programs, since it has to capture compli- 
cated relationships over the entire program state, across all concurrent compu- 
tations. Thus, the main practical obstacle to deductive verification is a suitable 
interaction mode for the programmer to invent and supply the necessary proof 
hints. This paper develops and implements a systematic conceptual framework 
for supplying these proof hints on a structured representation of the concurrent 
program, specifically eliminating the need to write complex invariants on the 
low-level encoding of the program as a flat transition system. 

The CIVvi verifier [18,25] addresses the aforementioned challenge by advo- 
cating layered refinement over structured concurrent programs. Instead of the 
monolithic approach that requires the programmer to prove the safety of a pro- 
gram P directly, CIVL allows the programmer to specify a chain of increasingly 
simpler programs P = Po, P1, ..-, Pa = P’ such that the safety of P; implies the 
safety of P;_, for alli € [1, n], thus transferring the safety obligation on P to P’. 
The overall correctness of the program is established piecemeal by focusing on 
the invariant required for each refinement step separately. While the program- 
mer does the creative work of specifying the chain of programs and the inductive 
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invariant justifying each link in the chain, the tool automatically constructs the 
verification conditions underlying each refinement step. 

The core principle of a layered refinement proof in CIVL is iterative program 
simplification through two kinds of creative reasoning. First, the programmer 
must think about the primitive atomic actions used to specify a particular pro- 
gram P; in the chain of programs. These atomic actions must be chosen to have 
useful commutativity properties which allow the tool to provably eliminate pre- 
emptions at many control locations in P;, thus creating large preemption-free 
execution fragments. Second, the programmer must think about the justification 
for the transformation of P; into the next program 7,41. This transformation 
may be complex because (1) some of the variables in P; may become irrelevant, 
(2) new variables may be needed for the primitive atomic actions in Pj41, and 
(3) the transformation may simplify complex control flow (branching, procedure 
calls, recursion, etc.) into a single step that executes an atomic action. This 
paper focuses on the necessary foundation and tool support for this second kind 
of creative reasoning. 

We present our technique on an idealized yet general language RefPL, suit- 
able for expressing structured parallelism, asynchronous computation, atomic 
actions of arbitrary granularity, and dynamically-scoped preemption-free code 
fragments. Using the design of RefPL and the formalization of its operational 
semantics, we present two technical contributions. 

Our first contribution is a general proof rule for soundly abstracting a recur- 
sive RefPL program P into another RefPL program P’ that hides subsets of global 
variables, local variables, procedures, and atomic actions in P. Our proof rule 
goes beyond CIvL in two ways. First, it provides the capability to hide local 
variables of procedures, specifically parameters, in addition to global variables. 
This capability allows us to replace a procedure with an atomic action with a 
smaller interface by hiding the extra parameters. Refinement proofs are sim- 
plified because it becomes easy to introduce local snapshots of global variables 
needed for specifications, pass these snapshots around as parameters to proce- 
dures, and finally recover the original interface by hiding these extra parameters. 
Second, unlike CIVL our proof rule is capable of performing refinement proofs 
on arbitrarily recursive programs. Since hiding low-level details is the core prin- 
ciple of the layered refinement methodology, our proof rule contributes towards 
increasing the expressiveness of refinement proofs compared to CIVL. 

Our proof rule depends on invariants that constrain the reachable states of the 
program. Our second contribution, an aid to our refinement rule but also inde- 
pendently useful, is a new specification idiom called yield invariants—named, 
parameterized, and interference-free invariants that can be called in parallel 
with ordinary procedures to soundly constrain the interference possible at yields 
within the called procedure. Since a yield invariant is named, its definition is 
separate from its invocation, thereby allowing proofs of interference-freedom to 
be performed once and reused for each call site. Since it is parameterized, it can 
be specialized to the needs of a call site by passing suitable input parameters. 
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Reasoning with yield invariants becomes difficult in concurrent programs 
when the absence of interference must be justified using facts referring to local 
variables of different procedures executing in different threads. The alternative 
of using global ghost variables that have the same information as local variables 
is theoretically possible but impossibly tedious. We observe that local proofs for 
many of these programming patterns can be achieved by exploiting permissions 
that are redistributed by atomic actions and otherwise passed around the pro- 
gram without duplication via input and output parameters of procedures. To 
track permissions, we enhance the interface of yield invariants, procedures, and 
atomic actions with annotations that satisfy a discipline enforced by a combi- 
nation of linear typing [38] over procedure bodies and logical reasoning over the 
transitions of atomic actions. 

The formalization in this paper is the basis of a new design and implemen- 
tation of the CIVL verifier. We hope that CIvL will serve researchers as a viable 
platform for experimenting with optimizations and implementation decisions. 

To summarize, this paper makes the following contributions: 


— It presents a core language RefPL for expressing modular proofs of refine- 
ment over structured concurrent programs. The formulation of refinement for 
RefPL is general and allows the user to encode verification of an arbitrary 
safety property as refinement verification. Furthermore, RefPL enables the 
construction of layered proofs [25] of safety via iterated refinement. 

— A refinement proof for RefPL is modular and decomposed along program syn- 
tax through the use of yield invariants. The interfaces to procedures, actions, 
and yield invariants exploit a linear typing discipline [38] that enhances local 
verification through the use of permissions. 

— Finally, we present a robust implementation of the refinement rule and yield 
invariants in the CIvL verifier. 


1.1 Related Work 


Formal verification techniques based on stepwise refinement have long been advo- 
cated, in theory, for construction of verified programs (e.g., [5,35,36]). This paper 
takes its inspiration from TLA [28] and Event-B [8,4] which popularized refine- 
ment as an approach for reasoning about a concurrent program modeled as a 
transition system. Recent efforts [10,16,17] have developed support for develop- 
ment of verified programs atop the foundation of refinement over transition sys- 
tems. Our work develops a foundation and tool support for refinement over struc- 
tured concurrent programs rather than flat transition systems. We are encour- 
aged by broad interest in the use of automatic program simplification [12,15] to 
reduce the complexity of reasoning about concurrent programs. 

The technique of yield invariants is inspired by interference-free location 
invariants in the work of Owicki and Gries [34] and the rely specification in rely- 
guarantee reasoning [21]. Yield invariants attempt to import the reuse of rely 
specifications to location invariants. We introduce linear interfaces to encode 
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permissions to address the practical concern of unwieldy ghost state. While per- 
missions have been used before for encoding ownership in heap-manipulating 
programs [32], our encoding of permissions is different, applicable to any shared 
resource, and targeted specifically at noninterference reasoning. 

There are other efforts to build practical verifiers for concurrent programs. 
Some verifiers focus on automation and target specific programming models and 
languages [7,11,20,29]. Our verifier is just as automated but capable of targeting 
a variety of programming models because of the foundation of atomic actions 
in RefPL. Other verifiers share our focus on expressiveness by providing general 
and certified metatheory [22] but are less automated; our verifier attempts to 
increase expressiveness without sacrificing automation. None of these aforemen- 
tioned verifiers focus on refinement and layered proofs. 

Our work bears a superficial resemblance to proof methods [8,23,37] for 
linearizability [19]. Our work targets the general problem of safety verification. 
Linearizability is a specific safety property to which our method is applicable. 


2 Overview 


In this section, we illustrate our contributions on a set of example programs. 
Section 2.1 presents yield invariants, Sect. 2.2 presents refinement, and Sect. 2.3 
presents linear interfaces. 


2.1 Yield Invariants 


Figure 1 shows a simple RefPL program. The first column shows a global counter 
x, a procedure incr_x that increments x twice, and a yield invariant yield_x that 
characterizes the interference from other threads while a thread is executing 
incr_x. The increments of x on lines 4 and 6 are separated by a call to the yield 
invariant yield_x. RefPL provides a single call statement for calling any number 
(including zero) of procedures and yield invariants in parallel. The preserves spec- 
ification on line 3 indicates that yield_x is both a precondition (usually indicated 
by requires) and a postcondition (usually indicated by ensures). In RefPL, each 
precondition of a procedure is a call to a yield invariant; all preconditions are 
called in parallel at procedure entry. Similarly, each postcondition is a call to a 
yield invariant; all postconditions are called in parallel at procedure exit. 

This paper focuses on reasoning about cooperative semantics in which pre- 
emptions occur only at entry into a procedure, at a call during its execution, 
and at exit. The RefPL verifier proves the correctness of yield_x and incr_x mod- 
ularly on these cooperative semantics. Specifically, the yield invariant yield_x is 
proved interference-free since the only operations in the program that modify x 
increment it. The procedure incr_x is proved by using the precondition of incr_x 
to establish the yield invariant at line 5 and then using the yield invariant to 
prove the postcondition at exit. This proof of incr_x depends on the observation 
that the input parameter _x of incr_x is passed as the argument to the three calls 
to yield_x: in the precondition, on line 5, and in the postcondition. The second 
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var x: int // > 0 var y: int // > 0 procedure incr_x_y() 
requires yield_x(0) 


procedure incr_x(_x: int procedure incr_y(-y: int x g 
preserves ae as ) preserves eee requires yield_y(0) 
seal tee Ek Sige > pn incr_x_y() 
AL a fea ae call incr_x(0) || yield_y(0) 
call incr_y(0) || yield_x(0) 
invariant yield_x(_x: int) invariant yield_y(_y: int) 2 asserttO<xA0O<y 
x<x yey 


Fig. 1. Incrementing two separate counters to illustrate yield invariants. 


column shows code similar to what we just discussed, except on global variable 
y, procedure incr_y, and yield invariant yield_y. 

The third column show a procedure incr_x_y which uses recursion to create an 
unbounded number of concurrent threads. incr_x_y nondeterministically spawns 
a copy of itself on lines 20-21, calls procedures to increment x and y on lines 22— 
23, and asserts a safety property about x and y on line 24. Our verification goal 
is to prove that if a single instance of incr_x_y starts in a state that satisfies the 
initial constraints on x and y, indicated on lines 1 and 9 respectively, then the 
assertion on line 24 holds in every copy of incr_x_y. 

The proof of procedure incr_x_y shows the modularity of yield invariants. 
First, notice that no new yield invariants are needed; the entire proof of incr_x_y 
is achieved by reusing yield_x and yield_y. Specifically, yield_x and yield_y are 
called in parallel with each other at entry, yield_y is called in parallel with incr_x 
at line 22, and yield_x is called in parallel with incr_y at line 23. Second, the 
arguments to yield_x and yield_y are specialized to match the constraints in the 
initial state and the assertions. 


2.2 Refining Atomic Actions 


Figure 2 shows a spin lock implementation and a client that uses the spin lock to 
atomically increment a shared counter. Procedure Acquire (lines 22-28) acquires 
the lock and procedure Release (lines 29-34) releases the lock. Both procedures 
use a primitive atomic action CAS (compare-and-swap) defined on lines 10- 
14 with two parameters—old_b and new_b. This action compares the value of a 
global variable b to old_b. If they are equal, b is set to new_b and true is returned, 
otherwise, b is not modified and false is returned. Acquire attempts to set b from 
false to true repeatedly via recursive call to itself (line 28) until it succeeds. 
Release sets b back to false from true. 

Procedure Incr (lines 16-21) atomically increments the global variable count 
by acquiring the lock, reading count into a local variable t by calling Read 
(lines 35-39), writing t+1 back to count by calling Write (lines 40-43), and 
finally releasing the lock. We prove that Incr implements an atomic increment 
via a sequence of two refinement steps. 

The first step abstracts the procedures Acquire, Release, Read, and Write into 
atomic actions AcquireSpec, ReleaseSpec, ReadSpec, and WriteSpec, respectively. 
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// Concrete global variables 
var b: bool // false 
var count: int 


// Abstract global variable 
var |: Option(Tid) // None 
// Supporting invariant 
invariant LockInv() 

b <> (I 4 None) 


// Primitive actions 


returns (success: bool) 
success := b = old_b 
if (success) 
b := new_b 


// Atomic increment 


procedure Incr(/inear tid: Tid) 


preserves LockInv() 


procedure Acquire( 
linear tid: Tid) 
refines AcquireSpec 
preserves LocklInv() 
exec t := CAS(false, true) 
if (t) | := Some(tid) 
else call Acquire(tid) 


procedure Release( 
linear tid: Tid) 

refines ReleaseSpec 

preserves LockInv() 
exec CAS(true, false) 
| := None 


procedure Read( 
linear tid: Tid) 

returns (v: int) 

refines ReadSpec 
v := count; 


action AcquireSpec( 
linear tid: Tid) 
assume | = None 
| := Some(tid) 


action ReleaseSpec( 
linear tid: Tid) 
assert | = Some(tid) 
| := None 


action ReadSpec( 
linear tid: Tid) 
returns (v: int) 
assert | = Some(tid) 
v := count 


call Acquire(tid) 
call t := Read(tid) || LockInv() 
call Write(tid, t+1) || LockInv() 
call Release(tid) 


action WriteSpec( 
linear tid: Tid, v: int) 
assert | = Some(tid) 
count := v 


procedure Write( 
linear tid: Tid, v: int) 

refines WriteSpec 
count := v; 


Fig. 2. Spin lock to illustrate refinement of atomic actions. 


These atomic actions, defined in the third column of Fig. 2, provide an explicit 
specification of the locking protocol for accessing the shared variable count. The 
specification of these actions requires the introduction of (1) a local parameter 
tid containing the unique id of the thread executing the code, and (2) a global 
variable | whose value is either None when the lock is not held or Some(tid) 
when the lock is held by thread tid. The second step uses these atomic actions 
to abstract Incr to an atomic action that increments count by 1. 

There are two challenges in the first refinement proof. First, the lock imple- 
mentation is defined using the concrete Boolean variable b, whereas the lock 
specification is defined using the logical lock variable |. Second, the implemen- 
tation of Acquire is recursive, which is technically challenging for refinement 
reasoning. The solution to the first problem is to introduce | and hide b during 
the refinement proof. To introduce | into the concrete program, it is updated 
appropriately when Acquire (line 27) and Release (line 34) complete successfully. 
Furthermore, the relationship between the variables b and | is captured by the 
yield invariant LockInv (lines 7-8) which is used in the precondition and postcon- 
dition of Acquire and Release. The solution to the second problem is a powerful 
rule for refinement reasoning, described in Sect. 4, which allows the recursive call 
to Acquire on line 28 to be replaced by a call to the specification AcquireSpec 
while modularly proving that the body of Acquire refines AcquireSpec. 

To set up the second refinement proof, the procedure procedure Incr(/inear tid: Tid) 
calls in the body of Incr are replaced by invocations of ‘es a 
the corresponding abstract atomic actions (as shown on exec t := ReadSpec(tid) 
the right here). The rewritten body of Incr is preemption- cae Coo 
free; a yield may occur only at the beginning or the end. action IncrSpec() 

This assumption is justified by a commutativity analy- count = count + 1 


Refinement for Structured Concurrent Programs 281 


sis based on the observation that AcquireSpec is a right mover, ReleaseSpec is 
a left mover, and ReadSpec and WriteSpec are both movers [14]. Proving these 
mover types requires that the tid input parameters of two concurrent actions 
are distinct, which is specified by the linear annotation. In addition to encoding 
distinctness of values, linear variables can be used for encoding disjointness of 
permissions associated with values. We present an example illustrating permis- 
sions in Sect. 2.3 and a detailed technical description in Sect. 4. 

For the prove that procedure Incr refines the action IncrSpec, which incre- 
ments count atomically, we do not need the invariant Locklnv anymore; in fact 
we do not need any invariant. Furthermore, the local parameter tid and the global 
variable | are no longer needed in the program and can be hidden. Hiding local 
variables is a novel feature of the refinement method described in this paper. The 
capability to introduce and subsequently hide global and local variables allows 
us to chain a sequence of refinement steps, localizing the use of variables to the 
parts of the proof that need them. 


2.3 Linear Interfaces 


Figure 3 shows a synchronization protocol extracted from a verified concurrent 
garbage collector [18]. There are N mutator threads (procedure Mutator on 
line 28) numbered from 1 to N, and one collector thread (procedure Collector 
on line 38) with ID 0. The protocol ensures that no mutator accesses memory 
(line 37) concurrently while the collector is doing a root scan (line 44) using 
barrier synchronization. Before the collector runs, it sets the Boolean variable 
barrierOn to true (line 40) and waits until the integer variable barrierCounter gets 
0 (line 42). Before a mutator accesses memory, it reads barrierOn (line 31). If false, 
the mutator goes ahead. Otherwise, it signals to the collector by decrementing 
barrierCounter (line 34) and waits for barrierOn to be reset to false (line 36). 
This example declares both global and local linear variables (specified by 
linear, linear_in, linear_out). Every linear variable—or more precisely, its current 
value—is assigned a set of permissions of type Perm according to the collector 
functions C1, C2, and C3. A linear integer i holds both Left(i) and Right(i), a 
set of integers holds the corresponding Left permissions, and a Perm value holds 
itself. Note that Perm is not special; any value can be a permission. For every 
program location we can compute the set of available linear variables. For exam- 
ple, when a mutator enters the barrier (line 34), i becomes unavailable because 
the permission Left(i) is transferred to the ghost variable mutatorsInBarrier. Then 
i becomes available again after exiting the barrier (line 36). Global linear vari- 
ables (mutatorsInBarrier here) are always available. Parameterized by the linear 
collectors, our linearity framework establishes the generic invariant that all per- 
missions across all available linear variables are disjoint. Now suppose that some 
mutator i is at line 37, where it holds both of its permissions and in particular 
Left(i), while the collector is at line 45, where mutatorsInBarrier holds all Left per- 
missions and in particular Left(i). This situation is impossible, since the linearity 
feature of RefPL ensures that a duplication of permissions is impossible. 
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datatype Perm = Left(int) | Right(int) 23 procedure Mutator(linear i: int) 
function linear C1(i: int) = {Left(i), Right(i)} 29 requires i € [1..N] preserves Barrierlnv() 
function linear C2(ids: Set(int)) = {Left(i) | i ids} © Var b: bool, p: Perm 
function linear C3(p: Perm) = {p} i T := IsBarrierOn() 
const N: int // positive call Barrierlnv() 
var barrierOn: bool // false exec p := EnterBarrier(i) 
var barrierCounter: int // N 3 call Barrierlnv() || Mutatorlnv(p, i) 
var linear mutatorsInBarrier: Set(int) // Ø 36 exec WaitForBarrierRelease(p, i) 
// Primitive actions 3 // access memory here 
10 action IsBarrierOn() returns (b: bool) procedure Collector(linear i: int) 
i b := barrierOn 39 requires i = 0 preserves Barrierlnv() 
12 action EnterBarrier(/inear_in i: int) o exec SetBarrier(true) 
13 returns (linear_out p: Perm) l call Barrierlnv() || CollectorInv(i, false) 
j assert i € [1..N] 2 exec WaitBarrier() 
15 mutatorsInBarrier := mutatorsInBarrier + {i} i call Barrierlnv() || CollectorInv(i, true) 
1 barrierCounter := barrierCounter — 1 j // do root scan here 
1 p := Right(i) T assert mutatorsInBarrier = [1..N] 


; s ; exec SetBarrier(false 
is action WaitForBarrierRelease ( ) 


1e (linear-in p: Perm, linear-out i: int) z // Supporting invariants 
20 assert p = Right(i) A i € mutatorsInBarrier invariant Barrierlnv() 
assume —barrierOn ‘ mutatorsInBarrier C [1..N] A 
mutatorsInBarrier := mutatorsInBarrier — {i} 50 size(mutatorsInBarrier) + barrierCounter = N 
barrierCounter := barrierCounter + 1 51 invariant Mutatorlnv(linear p: Perm, i: int) 
action SetBarrier(b: bool) 52 p = Right(i) A i € mutatorsInBarrier 
barrierOn := b 53 invariant Collectorlnv(/inear i: int, done: bool) 
action WaitBarrier() 5 i = 0A barrierOn A 
assume barrierCounter = 0 55 (done = > mutatorsInBarrier = [1..N]) 


Fig. 3. Barrier synchronization to illustrate linear interfaces. 


The strength of linearity, which leads to a less tedious verification task, is 
that its invariant connects variables from different scopes, without the need to 
explicitly state (and prove) this invariant. The programmer only provides a lin- 
earity specification which is checked automatically (see Sect.4). The resulting 
guarantees can then be assumed “for free”. In contrast, even stating a corre- 
sponding invariant requires the introduction of auxiliary global variables and 
helper invariants to connect them to local variables. 


3 RefPL: Syntax and Semantics 


In this section we present RefPL, a core programming language which is carefully 
designed to be (1) a minimal yet general modeling language to express concurrent 
programs, (2) able to express invariants over program executions, and (3) suit- 
able for expressing (refinement-based) program transformations. RefPL focuses 
on interfaces for modular verification, while abstracting from detailed expression 
syntax and types. 


Syntax. Figure 4 (top panel) summarizes the syntax of RefPL. We assume sets 
of names which we use to name actions (A), procedures (P, Q), yield invariants 
(Y), and statement labels (A). A set of variables is partitioned into global and 
local variables, and a store o is a partial map from variables to values. We write 
o’ C o if a is an extension of o’, oly for the restriction of ø to V, ala’] for the 
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store that is like o’ on dom(o’) and otherwise like ø, and g- for the combination 
of a global and local store. A program consists of a finite set of global variables 
gs, a partial map as from action names to actions, and a partial map ps from 
procedure names to procedures. Both actions and procedures have an interface 
of input variables I and output variables O, and procedures have additional local 
variables L. A (gated atomic) action [13,26] consists of a gate p and a transition 
relation T. The gate is a set of stores (i.e., a predicate) over gs UI. Executing the 
action in a state that does not satisfy the gate fails the execution. Otherwise, 
every transition (ø, 0’, 2) in r describes a possible atomic state transition from o 
(over gs U I) to o’ (over gs UO), together with the creation of new asynchronous 
threads according to a set of pending asyncs N; every pending async (£, P) € 2 
is turned into a new thread that executes procedure P with input store @. A 
procedure consists of a statement s that is composed of standard control-flow 
commands and two call commands: exec to invoke actions and call for the 
parallel invocation of multiple procedures. Every entry in the invocation sequence 
of a call is called an arm of the call, and the label A is used to attach specification 
information to the call. Parameter passing is expressed using an input map ı from 
the callee’s formals J to the caller’s actuals 7 U O U L, and an injective output 
map o from the callee’s formals O to the caller’s actuals O U L. Input variables 
are immutable, since they are not mapped to by output maps and the variables 
of a procedure are not modified anywhere else. Output and local variables of a 
procedure are initialized to the default value ®%. In RefPL, loops are modeled 
using recursion, and conditional statements are modeled using nondeterministic 
branching (*) and actions that assume the branching condition. 


Type Checking. For a program we require that (1) the action name in an exec 
statement is in dom(as), (2) the procedure names in a call statement are in 
dom(ps), and the actual outputs of all arms are disjoint from each other and all 
actual inputs, and (3) for every pending async (¢, P) in the transition relation 
of an action in img(as), P € dom(ps) and dom(¢) contains all inputs of P. 


Semantics. Figure4 (bottom panel) presents the operational semantics of 
RefPL, a transition relation => over configurations that consist of a global store 
over gs and a finite multiset of threads. Each thread is a tree (which generalizes 
a call stack); a call statement creates new leaf nodes (Lf) and blocks the caller 
in an internal node (Nd) until all arms of the parallel call finish. Each tree node 
contains a frame (P, £, s) that represents the current state of a procedure P dur- 
ing execution: £ is the procedure’s current local store and s is a statement that 
remains to be executed. In the definition of = we use several evaluation contexts 
that have a unique hole e; filling the hole is denoted by -[-]. In particular, SC [s] 
is a statement with s in evaluation position, and PC [t] is a multiset of thread 
trees where t is a subtree in one of these trees. The operator o means function 
or relation composition. 

Atomic actions (invoked through the exec command) execute directly in the 
context of the caller; inline, if you will. If the current store does not satisfy the 
gate of an executed action, the execution stops in the failure configuration 4. It 
is important to appreciate the generality of atomic actions. First, they can rep- 


284 B. Krag] et al. 


A € ActionName P,Q € ProcName Y € InvName_ A€ Label 


Val > & s E Stmt ::= | skip | s;s| sxs 
v € Var = GVar U LVar | call) (P,1,0) | exec (A, 0, 0) 
g € GStore = GVar — Val DOL € Qe 
LE LStore = LVar — Val Action ::= (I, O, p,T) 
o € Store = Var — Val Proc ::= (I,O,L,s) 
p E Gate = 9 Store gs E QG Var 
TE Trans = 25torexStorex PASet as € ActionName — Action 
2 € PASet = oPSterexProcName ps E€ ProcName — Proc 
1,0 E€ IOMap = LVar — LVar P € Prog ::= (gs, as, ps) 
Inv ::= (I, p) lg € 2°" 
InvCall ::= (Y, ) li € (ActionName U ProcName U InvName) 
ys € InvName — Inv x{p, g) ober 
pre, post € ProcName — 2!Cal! lo € (ActionName U ProcName) — 2*Y°" 
inv € Label — 27O% Ic € Val = 2% 
Y ::= (ys, pre, post, inv) L ::= (lg, li, lo, le) 


ref € ProcName — ActionName 
mark € Label — {0, m} U N 


= (ref, mark) 
JaPa) SC ::= ès | SC; s 
t::=Lff|Ndft TC ::= e; | Nd f TCT 
T= {tet} PC :={TCO} uT 
= (g,T) |4 LC ::= PC|Lf (P, ee, SC)] 


for ps(Q) = (1,0, L, s) let 
init(Q, £) = (Q, 4r U [v > ®lvecouz, s) 


(call) (g, PC[Lf (P, £, SC[ca11; (Qi, ti, 0:)])]) > 
(g, PC[Nd (P, £, SC[call (Qi, ti, 0:1)]) Lf init(Qi, £0 ti)]) 


(return) (g, PC[Nd (P, £, SC[cal1 (Qi, vi, 01)]) Lf (Qi, li, skip)]) > 
(g, PC[LE (P, €[&: o or" |, SC|skip])]) 


(exec) as(A)=(,40,7) Eg (Lot), ĝi N) € por 
f =y] L = oo) T = {LF init(Q,0")| (0",Q) € 9} 
(g, PC[Lf (P, £, SClexec (A,v,0)])]) > (g', PC[Lf (P, U, SC[skip])] © T’) 


(fail) as(A) = (_,_,p,-) 739Cg:G-(€0r) €p (choice)  s’ € {51,52} 


(g, LC [4] [exec (A, b, o)]) > 4 (g, LC{é\[s1 * s2]) > (g, LC{é][s’]) 
(skip) (g, LC[é|[skip ; s]) > (g, LC[4|[s]) (stop) (g, {Lf (_,skip)} ¥T) > (9,7) 


Fig. 4. The programming language RefPL: syntax (top panel), proof annotations (mid- 
dle panel), and operational semantics (bottom panel). 
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resent atomic operations at an arbitrary level of granularity, from fine-grained 
low-level operations (e.g., as implemented in hardware) to coarse-grained sum- 
maries (e.g., obtained as part of a layered proof). Second, the notion of pending 
asyncs subsumes the need for a dedicated asynchronous call statement, and 
enables advanced proof techniques for asynchronous programs [24,26]. Finally, 
all accesses to global variables are confined to atomic actions. 

We distinguish between the preemptive semantics and the cooperative seman- 
tics of a program. The preemptive semantics = defines the standard fine-grained 
behaviors of a concurrent program, where a context switch can happen at any 
time. A program should be proved correct under its preemptive semantics. How- 
ever, for reasoning purposes we consider a cooperative semantics, where context 
switches only happen at procedure calls and returns. We call these locations 
yields. The justification for reducing reasoning about preemptive semantics to 
cooperative semantics is outside the scope of this paper (CIVL uses commuta- 
tivity reasoning and a reduction argument). 

A leaf node Lf(P, _, s) is yielding, if it denotes the entry or exit of procedure P, 
i.e., if ps(P) = (_,_,_,s) or s = skip. A configuration is yielding if all leaves are 
yielding, and cooperative if at most one leaf is not yielding. Then the cooperative 
semantics is given by restricting = to cooperative configurations. Notice that 
the configuration after an exec might be non-yielding. Thus, under cooperative 
semantics the pending asyncs created by exec can only start executing once the 
caller reaches the next yield. We note that arbitrary yields can be modeled with 
“empty” parallel calls (i.e., a call with no arms). 

A yteld-to-yield fragment {P|«,}@{k2} of a procedure P is any sequence 
of exec statements € that forms a path in P from «xı to &2, where «kı and Ko 
are either call statements, L, or T (xı = L for procedure entries; k2 = T for 
procedure exits). For example, procedure Acquire in Fig.2 has three yield-to- 
yield fragments: (A1) entry/successful CAS/then branch/exit, (A2) entry/failed 
CAS /call in the else branch, and (A3) call in the else branch/exit (i.e., an “empty” 
fragment). Let Gate(é) be the set of stores from which executing € cannot fail, 
and let Trans(€) be the set of tuples (o,0’, 2) where executing € from store 
o can result in o’ with all created pending asyncs collected in 2. We define a 
reduced transition relation > over yielding configurations, such that c > c if 
and only if there are cooperative but non-yielding configurations (¢;)1<i<nan>0 
with c Cy ee Ch c'. Thus, every step in } corresponds to the 
execution of a yield-to-yield fragment under cooperative semantics. 


4 Abstracting RefPL Programs 


This section presents a proof rule for transforming a concurrent program P into 
a concurrent program P’ such that there is a simulation between the cooperative 
executions of P and P’. The transformation comprises variable hiding (P’ has 
fewer global and local variables than P) and procedure abstraction (procedures 
in P are summarized to atomic actions in P’). Our proof rule takes as input a 
yield specification VY, a linearity specification L, and a refinement specification 
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R (see Fig. 4), and decomposes the refinement verification problem as follows. 


Linearity(P, Y, L) Safety(P,Y,L) Refinement(P, Y, L, R, P”) 
V,L, REP ~ P 


The yield specification declares yield invariants and attaches them to pro- 
gram locations, and the linearity specification declares linear interfaces and 
sets up a permission discipline (Sect. 4.1). The Linearity judgment (Sect. 4.2) 
ensures that the linear interfaces of procedures, actions, and invariants in P 
are valid, which establishes a linear disjointness property. The Safety judgment 
(Sect. 4.3) ensures that preconditions, postconditions, and invariants in P are 
valid and interference-free, which captures reachability information in P. Note 
that Linearity and Safety interact, as yield invariants can have a linear interface 
and safety checking assumes the guarantees of linearity checking. In our proof 
rule, the guarantees of Linearity (Lemma 1) and Safety (Lemma 2) establish 
the context for refinement checking. However, we stress that these guarantees 
are useful on their own, independent of refinement. The refinement specifica- 
tion (Sect. 4.4) declares how P is converted to P’, and the Refinement judgment 
ensures that every execution of P is simulated by an execution of P’ (Theorem 
1). In Sect. 5 we show how all of our obligations are implemented in practice. 


4.1 Yield Invariants and Linear Interfaces 


RefPL supports yield invariants of the form (I, p), where I are input variables 
and p is a gate over gs U I. In a yield specification Y = (ys, pre, post, inv), the 
map ys assigns invariant names to yield invariants, such that invariants can be 
“invoked” by name—similar to actions and procedures—by supplying an input 
map t. We will write y and w for sets of such invariant calls, and o — vy to denote 
that store ø satisfies ọ, i.e., gl Ep — > V(Y,e) E€ y Iĝ Cg: G(l0oz) E€ ys(Y).p. 
Then invariant calls are assigned to program locations as follows: pre(P) are 
the preconditions that must hold on entry to procedure P, post(P) are the 
postconditions that must hold on exit from procedure P, and inv(A) are the 
invariants that must hold at calls labeled with A. These are the yield locations 
in the cooperative semantics, under which we will show the invariants correct 
and stable under interference. 

RefPL supports linear permissions to enhance local reasoning. The core idea 
of linearity is to identify a subset of (linear) available variables among all vari- 
ables in all frames of a configuration. Every value stored in an available variable 
is mapped to a set of values called permissions, with the desired property that the 
values in available variables are mapped to disjoint permissions. This disjointness 
property can then be used as free assumption in other verification conditions. 

In a linearity specification £L = (lg, li,lo,lc), the linear global variables lg 
are a subset of gs, which are always available. For every action/procedure/in- 
variant name X, li(X,>) and li(X,<) are subsets of its input variables called 
linear-in and linear-out, respectively. The linear-ins expect to receive from an 
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available actual parameter, while the linear-outs ensure that their actual param- 
eter will be available upon return. An input variable can be both linear-in and 
linear-out (which we assume for all invariants). For every action/procedure name 
X, its linear outputs lo(X) are a subset of its output variables, such that the 
receiving actual return parameters become available when X returns. For exam- 
ple, in Fig.3 the global variable mutatorsInBarrier is linear, procedure Mutator 
and yield invariant Collectorlnv have a linear (linear-in and linear-out) input 
i, action EnterBarrier has linear-in input i and linear output p, and WaitFor- 
BarrierRelease has a linear-in input p and linear-out input i. The permissions 
assigned to an available variable are determined by a linear collector function Ic, 
which is a flexible mechanism to encode various permission disciplines. For con- 
venience, we lift Jc to collect all permissions of a set of variables V in store a, i.e., 
Ie(o, V) = kH ey le(o(v)). A simple example of a collector function that expresses 
unique identifiers (as needed in Fig. 2) would return the singleton set {tid} for a 
thread identifier variable tid. Figure 3 shows a more advanced usage, where the 
definition of Ic is split across the functions C1, C2, and C3 (see Sect. 2.3). 


4.2 Linearity 


Let us assign to every (sub)statement s in P a linear type ”,, written as s : "4, 
where in/out is the set of local variables available directly before/after executing 
s. Based on the linear interfaces in li and lo, the most general linear types can 
be inferred, but for simplicity we assume all types to be given and define a type 
checker below. Since linear types annotate each program location with available 
variables, we can define the collection of linear permissions over a configuration 
c= (g,T) as Ic(c) = Ic(g, lg) W (Wipes: Ic(£, in)), where (P, £, s : %,) ranges 
over all frames in all nodes of 7. Then the linear disjointness property for a 
configuration c is IsSet(Ic(c)), where IsSet(-) states that a multiset does not 
contain duplicates. We call such a configuration £-valid. The Linearity(P, Y, L) 
judgment comprises a semantic check on actions and a syntactic check on pro- 
cedures, which ensures the preservation of the linear disjointness property as 


follows. 
Lemma 1. Let c be an L-valid configuration of P. If c= c then œ is L-valid. 


Essentially, an execution starts with a set of permissions and redistributes these 
in every step. The permissions can stay the same or decrease, but never increase. 


Linear Action Checking. All state updates (other than parameter passing) 
are confined to atomic actions. We need to ensure that the outgoing permissions 
of an action are always a subset of the incoming permissions. Thus, for every 
A € dom(as) with as(A) = (-, -, p, T) we check 


(g£, gL, R) € por A inPerm = (Ic(g, lg) © le(l, li(A, œ>))) A IsSet(inPerm) => 


(te(9', Ig) W le(€, li(A, <)) © le(l, lo(A)) & (Wer peo lell", li(P, >)))) C inPerm. 


Starting with a set of permissions in the linear globals and linear-in inputs, the 
action can redistribute these permissions among the linear globals, its linear-out 
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n . in . out ¿t . in 
out C in S1 i out $2 * out! $1: out, $2: outs 
. ir A 5 oy . an 
skip * out $1582 | out! S1 * 82 | outynoute 


u(i(A,>)) C in out C (in \ o(li(A, >))) © (li(A, <)) © o(lo(A)) 


exec (A, 4,0) : but 


(W, u(li(Pi, >))) W (eosin (li(Y, >))) Cin 
out C (in \W, u(li(P;, >))) W (W; u(li(P;, <))) wW (W; oi (lo(P;))) 


m 
call) (Pi, li, 0%) + Gut 


Fig. 5. Linear type checking. 


inputs and linear outputs, and the linear-ins of pending asyncs, but permis- 
sions cannot appear out of thin air. Notice that this check depends on the user- 
provided linear collector function le. For example, consider action EnterBarrier 
in Fig. 3. The linear-in input i holds the permissions Left(i) and Right(i) on entry 
(cf. collector C1). By adding i to mutatorsInBarrier we hand over the permission 
Left(i) (cf. collector C2), and by the assignment to the linear output p we hand 
over the permission Right(i) (cf. collector C3). Thus, the set of permissions in 
mutatorsInBarrier and i before is the same as the permissions in mutatorsInBarrier 
and p after executing EnterBarrier. 


Linear Type Checking. Now that we can trust the linear interfaces of actions, 
we need to ensure that the linear types in procedures “add up” w.r.t. control 
flow and parameter passing. For every P € dom(ps) with body s : ‘", we require 
in = li(P, œ), out = li(P,<)Ulo(P), and a derivation of s : ”,, according to the 
rules in Fig. 5, where (V) means Hey /(v). For example, in procedure Mutator 
in Fig. 3 the linear input parameter i becomes unavailable at line 34, where it is 
passed as linear-in. However, this call makes the local variable p available, such 
that it can be passed as linear-in to the call on line 36. This call also passes i as 


linear-out input, which makes i available again on line 37. 


4.3 Safety 


In a yielding configuration (g, T), every frame (P, @,s) in T is associated with a 
set of invariant calls ọ as follows: y = pre(P) if s is the entry of P, p = post(P) 
if s is skip (the exit of P), or y = inv(A) if s is blocked at a call labeled with A. 
If g-€ = ọ holds in every frame, then we call the configuration Y-valid. To show 
that this property is preserved across the execution of a yield-to-yield fragment 
(i.e, a step in >), the Safety(P,Y, L) judgment is decomposed into two kinds of 
procedure-modular verification conditions: (1) a sequential check which ensures 
that the next ọ in the executing frame is established, and (2) a noninterference 
check which ensures that the y’s in all other frames are preserved. Both checks 
weave in linearity to enhance local reasoning. 
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Lemma 2. Let c be an L-valid, Y-valid configuration of P. Ifc > œ then c is 
y-valid. 


Floyd Packages. For convenience, let pre(«) be the set of all invariants and 
preconditions of a call statement « (and post(«K) analogously): 

pre(cally (Qi, ti, 0i)) = inv(A) U (U,{(¥, u © 1) | (Y, 1) € pre(Qi)}) 

post(cally (Qi, li, 01)) = inv(A) U CLAS (tis Vox) or) | (Ye) € post (Qi)}) 
For every yield-to-yield fragment {P |«xı}e{k2} of P € dom(ps) we define a 


Floyd package {P |p| ll} é{w}, which contains the invariants y and linear avail- 
able variables ll before, and the invariants w after the yield-to-yield fragment: 


_J@reP) ,4,c)ifm=t . _ f post(P) if k2 = T 
(ai= ae ,out(K1)) if ki AL’ = e if k2 AT 


Sequential Checking. For every Floyd package {P |p| ll} €{W} we check 


T. ) ( ) a gL H 
d (gL, g l, Q) € Trans(€ => (: i m ) : 
a IsSet(le(g-£, lg U U)) NE oe EEE pre E) 


After @ executing € from a store with © disjoint permissions that © satisfies y, it 
must be the case that y and © the preconditions of all created pending asyncs 
hold. Notice that we can assume all gates of atomic actions when executing €. 
This is the case because yield invariants are not supposed to be strong enough to 


prove P safe. Their purpose is to establish the context for refinement checking. 


Noninterference Checking. For every Floyd package {P|y|ll}e{w} and 
every yield invariant Y € dom(ys) we check 


gl EpAgl FY 
d (g-l, g'--,-) € Trans(e) = og l EY. 
d IsSet(le(g-£, lg U Ul) w Ic(l’, li(Y, œ>))) 


After executing € from a store with © disjoint permissions that © satisfies 
both y and Y, it must be the case that © Y still holds. A key ingredient that 
makes our yield invariants powerful is the possibility to pass parameters to them 
(¢’ above, which is the same before and after executing €), together with the 
possibility to give invariants a linear interface to include them in the disjointness 
assumption ©. The reuse of named, parameterized invariants that are inductive 
on their own facilitates ergonomic and modular proofs as well as a reduction in 
the number of noninterference checks compared to location invariants. 


The example in Fig. 3 uses three yield invariants. Barrierlnv states a global 
property on barrierCounter and mutatorsInBarrier, MutatorInv states a property of 
mutators on line 35, and Collectorlnv states a property of the collector at lines 41 
and 43 (notice the difference in the Boolean parameter). The linear parameters 
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to both Mutatorlnv and Collectorlnv are essential to prove their noninterference. 
For example, linearity discharges all noninterference obligations of CollectorInv 
w.r.t. yield-to-yield fragments in procedure Collector; there cannot be two differ- 
ent available variables i both holding thread identifier 0. Collectorlnv is also stable 
across the yield-to-yield fragments in procedure Mutator: by linearity, we know 
that EnterBarrier cannot execute if mutatorsInBarrier holds all mutator identi- 
fiers, and WaitForBarrierRelease is blocked when barrierOn is true. As an exam- 
ple of a sequential check, observe that the invariants at line 41 together with 
barrierCounter = 0 from executing WaitBarrier imply the invariants at line 43, in 
particular that mutatorsInBarrier holds all mutator identifiers. 


4.4 Refinement 


Recall that the goal of our proof rule is to transform a program P = (gs, as, ps) 
into a program P’ = (gs’, as’, ps’). So far, we showed how the two judgments 
Linearity(P, Y, L) and Safety(P, Y, L) establish properties on executions of P, 
using a linearity specification £ and yield specification Y. In the remainder of 
this section we show how the Refinement(P, Y, L, R, P’) judgment ties together 
P and P’ using a refinement specification R. 

Consider an execution step c > c’ of P. We want to say that there is a rep- 
resentative step ĉ > ĉ in P’. Representative means that ĉ and ĉ are abstract 
representations of c and c’, respectively. We capture this notion in an abstraction 
mapping a, which maps every concrete configuration of P to an abstract config- 
uration of P’. Then the meaning of the judgment L, Y, R F P ~ P’ derived by 
our proof rule is expressed in the following theorem. 


Theorem 1. Let c be an L-valid, Y-valid configuration of P. (1) Ifc > 4 then 
a(c) = 4. (2) fc> c then either a(c) = a(c), alc) S ald), or a(c) = 4. 


The safety of P’ should imply the safety of P. Thus, (1) states that any failure 
in P is preserved in P’. And (2) states that every step in P is matched with 
a (potentially stuttering) step or failure in P’. Hence, P’ can fail “more often” 
than P, but otherwise “behaves like” P. 


Refinement Specification. In a refinement specification R = (ref, mark), the 
refinement mapping ref is a partial map from dom(ps) to dom(as’). For every 
procedure P € dom(ref), we check that P is abstracted by action A = ref (P). 
Since our refinement checks are procedure-modular, we require dom(ref) to be 
closed under calls in ps (not including pending asyncs). In general, P executes 
multiple yield-to-yield fragments and possibly calls other procedures, while A 
executes in a single atomic step. Thus we need to ensure that exactly one yield-to- 
yield fragment in P behaves like A, while all other fragments have no visible side 
effect. We use a marking function mark to identify where A should happen in P. 
For every call statement with label A, mark(A) is either O (“before”), ll (“after”), 
or the index 7 € N of some arm of the call. This means that we are still before A 
when the call returns, that we are already after A when reaching the call, or that 
arm i establishes A, respectively. Naturally, procedure entry and exit are marked 
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with O and W, respectively. Then the marks along every path of P must match 
the regular expression O+N’I*, which distinguishes two cases. (M1) No call is 
marked with an index i € N. Then some yield-to-yield fragment switches from 
to W, which we will check to behave like A. All other yield-to-yield fragments and 
calls on the path must have no side effect. (M2) Some call is marked with index 
i € N. We will check that arm i of this call behaves like A, while all other calls 
and yield-to-yield fragments on the path must have no side effect. Since we check 
mark per path, there are in general multiple occurrences of (M1) and (M2). 
In Fig. 2, the ref mapping is specified using the refines keyword. For example, 
procedure Acquire refines the atomic action AcquireSpec. The mark mapping is 
not explicitly specified, but we consider the call on line 28 to be marked with 1 
(the index of its only arm). Then one path through Acquire is marked with OI 
and the other one with 01M, both matching the regular expression above. 


Program Rewriting. The program P = (gs,as,ps) is rewritten into P’ = 
(gs’, as’, ps’) as follows. First, global variables can be hidden, such that gs’ C gs. 
Second, new atomic actions can be added (for new abstractions of procedures) 
and unreferenced ones removed, but for A € dom(as) N dom(as’) we require 
as'(A) = as(A). Recall that an action can execute in any program that con- 
tains the referenced global variables and procedures. Third, dom(ps’) = dom(ps) 
and we rewrite every ps(P) = (I,O,L,s) into ps’(P) = (1’,O', L’, s’) as fol- 
lows. Local variables can be hidden, such that I’ C TAO’ C OAL C L. If 
P ¢ dom(ref), then s’ is like s, except that call arms (Q,e,0) with ps’(Q) = 
(Ig, Oa, ~ -) turn into (Q, lro, log), with the requirement img(0) N (O'U L’) = 
img(o|o,) that formal and actual outputs can only be hidden together. We 
denote this rewriting of a statement by a(s). If P € dom(ref), then s’ = 
exec(ref(P), id(I’), id(O’)), where id(-) is the identity mapping on a given set of 
variables. We denote this exec statement by a(P). Thus, procedures in dom(ref) 
remain in P’, but with their bodies rewritten to a single exec to their abstrac- 
tion. Clearly, the action interface as’ o ref(P) = (I',O’,-,-) must match the 
procedure, and L’ = Ø. Overall, P’ must still typecheck, which ensures, e.g., 
that the remaining actuals in input/output maps were not hidden. 

In the first refinement step of Sect.2.2, where the procedures in the second 
column of Fig.2 are abstracted to the atomic actions in the third column, the 
global variable b is hidden. In the second refinement step, where procedure Incr 
is abstracted to action IncrSpec, the input parameter tid and the global variable 
| are hidden. Notice that, in order to chain together these two refinement steps, 
we performed an auxiliary rewriting step in procedure Incr that converted call 
statements to exec statements. CIVL automatically performs this transformation 
as part of a refinement step, justified by a commutativity argument we explained 
in Sect. 2.2. However, this rewriting is not formalized as part of our refinement 
rule in this paper. 


Skip Action. In the following we assume a special action Skip that has no 
inputs and outputs, does not modify global variables, and creates no pending 
asyncs. Formally, as(Skip) = (Ø, Ø, {e}, {(e,¢,S)}), where € is the empty store. 
Observe that safety verification (i.e., showing that the failure configuration 4 is 
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unreachable) is a special case of refinement, where all global and local variables 
are hidden, and all procedures are abstracted to Skip. 


Abstraction Mapping. Figure6 defines the abstraction 
mapping a. In a given yielding configuration, we restrict the 
global store to gs’ and drop all trees rooted in a node that 
refines Skip. The remaining nodes are traversed recursively, 
where frames with P ¢ dom(ref) (nodes @ on the right) are 
rewritten as expected. The interesting case is for nodes with 
P € dom(ref), like node @ on the right. In this case, @ is turned into a leave 
(cutting off the remaining subtree) whose statement is either a(P) (the single 
exec of ref(P)) or skip. Intuitively, to match the concrete steps of P (in @ 
and its subnodes), the abstract configuration first stutters at a(P), then tran- 
sitions to skip when the effect of ref(P) happens, and then stutters at skip 
until the return from @. The delicate part is to determine if ref (P) happened 
and to compute the local store for the abstract configuration. This is done by 
the early-return function r. The function recurses on the unique path of marked 
arms in calls, @-@—® in our example, and either returns (when “before 
ref(P)”) or a local store £ (when “after ref(P)”). Suppose that @,0,® have 
local stores ¢1, l2, €3, and that r(®) = £3. Then r(®) equals £2 updated with the 
return parameters from ¢3, say 44, and similarly r(@) equals /ı updated with 
the return parameters from ¢5, say ¢, which is the local store for the abstract 
configuration. Thus, r performs “early” return parameter passing, even though 
we are still in the middle of executing procedures. To prove Theorem 1, our ver- 
ification conditions below have to ensure that throughout subsequent concrete 
execution steps, r(®) remains 44. 


Refinement Packages. In a procedure P € dom(ref), the effect of the abstract 
action ref(P) can happen either in a yield-to-yield fragment directly in P, or 
nested inside another called procedure. To handle (potentially recursive) proce- 
dure calls during refinement, we decompose the problem into procedure-modular 
checks. Recall that the marking function mark identifies yield-to-yield fragments 
and call arms in P that should behave like the abstract action ref (P). Conversely, 
all other yield-to-yield fragments and call arms should have no side effect, which 
is to say that they should behave like Skip. Hence we have a refinement obliga- 
tion for every yield-to-yield fragment and every call arm in P, where refinement 
is either checked against ref(P) or Skip. We capture all these refinement obliga- 
tions uniformly in refinement packages of the form {P| y|ll}é{A}, where P is 
the procedure we check refinement for, y is a set of invariant calls and ll a set 
of available variables we can assume, € is an exec sequence denoting the effect 
we check refinement for, and A is the action we check refinement against. 


(R1) Refinement Packages for Yield-to- Yield Fragments. For every procedure 
P € dom(ref) and yield-to-yield fragment {P|} @{k2} of P we define the 
refinement package {P|y|ll}é@{A} where p and Il are defined the same as 
for Floyd packages, and A = ref(P) if mark(«,) = O and mark(k2) = W, or 
A= Skip otherwise. This case is rather straightforward. We proved the validity 
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Abstraction of configuration 


a((9,T)) = (gloss fa(t) | t E T A root(t) = P A ref (P) # Skip}) 


Abstraction of thread tree 


E +} ] as 744 f \ + PD r Y r witit 
For the definitions of a(s) and a(P), see program rewriting. 


lp = llrvouz if ps'(P) = (1,0,L,-) 

a(Lf (P, ,s)) = Lf (P, 4|p, a(s)) if P Z dom(ref) 

a(Nd (P, £, s) t) = Nd (P, £|p,a(s)) a(t) if P g dom(ref) 
a(P) if s Æ skip 
skip if s = skip 

a(P),£ if r(t) = 

skip, r(t) if r(t) 


a(Lf (P, é,s)) = Lf (P, é|p, s’) s= { if P € dom(ref) 


if P € dom(ref) 


a(Nd (P, 4, ) ) = Lf (P,l'|P, s’) 3,0 = { 
— 


t 


Early-return computation 


uwen- {Pir 
if mark(A) = 
r(Nd (P, £, 8C[cally (Q, n 0)]) ) = 2 £ | ee ae ii 
[r(ti) o 07+] if mark(A) =iAr(ti) £ 


Fig. 6. Abstraction mapping from configurations of P to configurations of P’. 


of y and ll before the fragment, and need to check that the code € in the fragment 
behaves either like ref(P) or skip. 


(R2) Refinement Packages for Call Arms. For every procedure P € dom(ref) and 
cally (Qj, ti, 0i) : ™, in P, let y = inv(A) and Ul = in \ U; i( (Qi, >)). At a call 
we know the validity of the invariants attached to the call and the availability 
of in minus the linear variables passed into the callees. Then for every arm 
(Qi, Li, 0i), let A; = ref (P) if mark(A) = i or A; = Skip otherwise. Now the final 
missing ingredient for a refinement package {P| |ll}é{A;} for every arm i is 
the effect € for which we check refinement against A;. To obtain a modular check, 
our solution is to use the abstract action specification of the callee Q;. Formally, 
e = exec (Bj, ilr, oilo) for B; = ref(Q;) with as’(B;) = (1,0, -,-). Recall that 
this is well-defined, since dom(ref) is closed under calls. Notice that using the 
specification of a callee while checking the specification of a caller is akin to 
reasoning with procedure pre- and postconditions, where circular dependencies 
are resolved via induction on the nesting depth. 


Recall (from the end of Sect.3) that procedure Acquire in Fig.2 has three 
yield-to-yield fragments: (A1), (A2), (A3). Each fragment induces an (R1)-type 
refinement package, where (A1) is checked against AcquireSpec, while both (A2) 
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and (A3) are checked against Skip. Furthermore, the call on line 28 induces an 
(R2)-type refinement package against AcquireSpec. 


Refinement Checking. The Refinement(P, Y, L, R,P’) judgment requires 
every refinement package {P|y|ll}é{A} to be discharged as follows. Let 
e = exec (A, id(I), id(O)) for as’(A) = (I,O,-,-) be the abstract effect we 
check refinement against, let V = gs’ UI’ UO’ for as’ o ref (P) = (1’,O’,-,-) be 
the non-hidden variables in the scope of the refinement package, and check 


© gL E Gate(e) = > g-f © Gate(é) 
rglEy ® (gL, gL, 2) E€ Gate(e) o Trans(é) = 
( d IsSet(Ic(g-l, lg U m) gê gÈ : (98,9, N|\ref) € Trans(e) 
Aglly = blv Ng lv =H lly 


where Q|rep = {(€,Q) E€ Q | ref (Q) F Skip}. 


We assume a store gf that satisfies © invariants and © linear disjointness accord- 
ing to the refinement package. Then refinement consists of two parts, failure 
preservation and behavior preservation. First, © if € can fail in the concrete 
then e must also fail in the abstract. Second, ® if e cannot fail in the abstract 
and € can transition to store g’¢’ while creating pending asyncs {2 in the concrete, 
then there must be a matching transition of e in the abstract. Here matching 
means that e starts in a store ge that agrees with g£ on the non-hidden variables 
V, ends in a store g/-’ that agrees with g/-’ on V, and creates the same pending 
asyncs except the ones to procedures abstracted to Skip. 


5 Implementation 


CIVL is a refinement-based verifier for concurrent programs built on top of the 
widely-used Boogie intermediate verification language. The Boogie [6] verifier 
provides infrastructure for compiling annotated sequential procedures into log- 
ical verification conditions whose validity is checked by a satisfiability-modulo- 
theories solver. CIVL is implemented as an extension of Boogie, which takes 
as input an annotated layered concurrent program [25] (in a language whose 
core is RefPL), performs concurrency-specific type checking and static analyses, 
and then encodes all the verification conditions of its proof rule into a standard 
sequential Boogie program. Thus, CIvL can be understood as a compiler that 
eliminates concurrency in a RefPL program by translating it down to a collection 
of sequential procedures, thus reusing the rest of the Boogie pipeline unchanged. 

The open-source CIVL verifier is a stable tool which is part of the master 
branch [2] and public release [1] of Boogie. CIvL has over 100 regression tests 
comprising both realistic programs and microbenchmarks. There are many pub- 
lished papers [9,26,27,33,39] that describe nontrivial examples verified using 
CIVL, most written by researchers other than the developers of CrvL. The code 
in CIVL is extensible; entirely new tactics for rewriting concurrent programs have 
been added to it [24,26]. Finally, CIVL is designed for interactive program devel- 
opment. It is fast and provides several command-line flags to focus verification 
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on parts of the program. CIvL has fine-grained error reporting including error 
traces, which attributes a verification failure to a particular check, local to a 
small part of the program. This helps the programmer to debug and iteratively 
improve both implementation and specification. 

An early version of the CIVL verifier was reported by Hawblitzel et al. [18]. 
The implementation of the techniques described in this paper has been done as 
part of the new design and implementation of CIVL based on the framework of 
layered concurrent programs [25]. In the rest of this section, we will continue to 
use CIVL to refer to our new implementation. We now present an overview of 
the different parts of the verifier. 


Type Checking. In addition to the standard type checking of a Boogie program, 
the CIVL type checker performs several extra checks. First, it checks that the 
layer specifications [25] on program elements such as global and local variables, 
atomic actions, and procedures are correct. Second, it checks using a dataflow 
analysis that it is sufficient to reason about the safety of cooperative semantics. 
This analysis exploits mover type [14] annotations on atomic actions to rea- 
son that yield-to-yield code fragments satisfy the requirements of Lipton reduc- 
tion [30]. It also generates logical verification conditions whose validity guarantee 
the correctness of the mover annotations on atomic actions. 


Linearity Checking. The CIvL linearity checker implements the method 
described in Sect.4.2 in two parts. First, it creates for each atomic action a 
sequential procedure which verifies that the multiset of outgoing permissions is 
a subset of the multiset of incoming permissions. We use the generalized array 
theory [31] to encode multisets, and the IsSet constraint in particular. Second, it 
type checks each procedure to compute the set of available variables at each con- 
trol location and to verify that linear interfaces of called procedures and atomic 
actions are used appropriately. 


Safety Checking. The CIVL safety checker implements the method described 
in Sect. 4.3. Unlike the formal description which enumerates yield-to-yield code 
fragments, the implementation is efficient, encodes all code fragments in a RefPL 
procedure into a single sequential procedure with maximal sharing, and adds 
the safety checks by injecting instrumentation code and assertions into a cloned 
copy of the original procedure. To express the noninterference check, we add 
instrumentation variables that take snapshots of global and output variables at 
every yield. Furthermore, the generalized array theory is used here as well to 
record the pending asyncs created in a yield-to-yield code fragment, such that 
their preconditions can be checked. 


Refinement Checking. The CIVL refinement checker implements the method 
described in Sect. 4.4. Similar to safety checking, the refinement checks are added 
as instrumentation to procedure copies. At every yield, snapshot variables (sim- 
ilar as for noninterference) are used to refer to the state at the previous yield 
when asserting the appropriate transition relation. CIVL computes a representa- 
tion of the transition relation of an atomic actions as a logical formula from the 
user-provided representation as imperative code. 
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6 Conclusions 


In this paper, we provide a foundation for refining structured concurrent pro- 
grams and an implementation in the CIVL verifier. The contribution of this 
paper, and that of CIvL in general, is the capability to express new proofs with 
significant advantages for the programmer in terms of proof structuring, anno- 
tation effort, and tool performance. 
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Abstract. Inspired by distributed applications that use consensus or 
other agreement protocols for global coordination, we define a new com- 
putational model for parameterized systems that is based on a general 
global synchronization primitive and allows for global transition guards. 
Our model generalizes many existing models in the literature, including 
broadcast protocols and guarded protocols. We show that reachability 
properties are decidable for systems without guards, and give sufficient 
conditions under which they remain decidable in the presence of guards. 
Furthermore, we investigate cutoffs for reachability properties and pro- 
vide sufficient conditions for small cutoffs in a number of cases that are 
inspired by our target applications. 


1 Introduction 


Distributed applications are notoriously difficult to implement and reason about, 
primarily due to the combinatorial explosion of behaviors resulting from the 
interleaving of computation and communication. Naturally, they have received 
a lot of attention from the formal methods community to facilitate reasoning 
about correctness properties that are too complex to reason about informally or 
manually [3,7, 14, 15,34, 36,42, 46, 50,52, 55]. 

One of the main challenges in fully automated reasoning about a distributed 
system is scalability in a critical system parameter—the number of processes— 
with the epitome of success being parameterized verification of correctness— 
correctness that holds regardless of this parameter. Unfortunately, the param- 
eterized verification problem is known to be undecidable even in very simple 
cases, for example, finite-state processes that pass a 2-valued token in a ring [54]. 
Hence, approaches for parameterized verification are divided into two groups: (i) 
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ones that support a large class of systems, but only provide semi-decision proce- 
dures [1,41] and (ii) ones that provide fully automatic decision procedures for a 
well-defined class of systems, but need to carefully restrict this class of systems 
to obtain such a strong result. While the former cannot provide any guarantee of 
success, the latter are often not sufficiently general to model practical examples. 

In this work, we target fully-automated parameterized verification for a sig- 
nificantly more general class of systems than addressed in prior work (cf. the 
surveys [9,21,26]). Inspired by distributed applications that use consensus or 
other agreement protocols for global coordination, we introduce global synchro- 
nization protocols, a new computational model for distributed systems that gen- 
eralizes most of the existing models based on process synchronization, including 
models based on pairwise rendezvous [32], asynchronous rendezvous [16], nego- 
tiation [27] and broadcasts [28]. We show that despite this generality, we can 
still decide parameterized verification for safety properties. Going beyond that, 
we show that under certain conditions, our model can be augmented with global 
transition guards—which allow to model semaphore-based access control as well 
as preconditions for global consensus-like coordination—while retaining decid- 
ability. This makes our model one of the most expressive models for which the 
parameterized verification problem is still decidable. Furthermore, we present 
several results on cutoffs for our model, i.e., the number of processes sufficient 
to prove or disprove properties of a parameterized system. Inspired both by 
the decision procedure and by negative examples that require large cutoffs, we 
define sufficient conditions on systems in our computational model that make 
small, practical cutoffs possible. Finally, we evaluate our approach on several 
distributed applications, showing that they can indeed be modeled as global 
synchronization protocols, and we illustrate the significance of our cutoff results 
in the verification of these benchmarks. 


Motivating Example. Our system model is inspired by applications that use 
agreement protocols, like leader election or consensus, as building blocks to 
achieve a more complex overall functionality. We are interested in a compo- 
sitional verification setting where we assume that the agreement protocols have 
been verified separately and want to guarantee the overall correctness of an 
application without having to explicitly model and verify the agreement proto- 
cols within the application; in particular, we focus on a setting where verified 
agreement protocols are encapsulated into an abstraction with precondition obli- 
gations and postcondition guarantees. 

Thus, our system model needs to be able to incorporate such pre- and post- 
conditions of agreement protocols. As a simple example, consider the smoke 
detector application in Fig. 1 whose intended behavior is as follows. Upon detect- 
ing smoke, the processes coordinate to choose (up to) 2 processes to report the 
smoke to the fire department. It uses different types of transitions, several of 
which are popular in the literature and are supported by existing decidability 
results: an internal transition (from state ENV to state ASK), a broadcast (on 
action Smoke), and a negotiation, i.e., a synchronous transition of all processes 
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Reset, G3 


Choose !!, G2 


Smoke!!, G1 


Chooseg!!, G2 


Choose?? 


G, = {ENV, Ask} 
G, = {PICK, IDLE} 
G; = {REPORT, IDLE} 


Fig. 1. A smoke detector process. The internal transition from initial state ENV to ASK 
models that a process detects smoke (an environment signal). A process that detected 
smoke can initiate a broadcast Smoke, moving all processes from ENv to IDLE and 
from ASK to PICK, where the transition Choose moves (up to) 2 processes to REPORT, 
and the rest from PICK to IDLE. Finally, all processes from REPORT and IDLE may move 
back to ENV in a synchronous transition with no dedicated sender. Transitions labeled 
with a set G; can only be taken if all processes are in this set. The safety property for 
a distributed smoke detector based on this process is that at most 2 processes should 
report the fire. 


with no distinguished sender (on action Reset). However, additionally our appli- 
cation requires that some transitions can only happen under certain conditions, 
given by guards G; in transition labels. For example, action Reset should only 
be possible if all processes are in G3, i.e., in states REPORT or IDLE. And most 
importantly, in state PICK we want the system to agree on (up to) 2 processes 
that move into state REPORT . This requires a novel type of transition that we 
have not found in existing literature, allowing two processes to take a distin- 
guished role while all other processes are treated uniformly. To faithfully model 
agreement of processes, we also require a guard on this transition, since any 
agreement protocol is based on the assumption that all processes are ready (i.e., 
their local state satisfies some condition) before invocation of the protocol. 


2 System Model: Global Synchronization Protocols 


We present global synchronization protocols (GSPs), a formal system model 
that generalizes most of the existing synchronization-based models in the lit- 
erature [16,27,28,32], including models based on rendezvous and broadcasts. In 
this model, each global transition synchronizes all processes, where an arbitrary 
number k of processes act as the senders of the transitions, while the remain- 
ing processes react uniformly as receivers. The model supports two basic types 
of transitions: (i) a k-sender transition, which can fire only if at least k pro- 
cesses are ready to act as senders, and is fired with exactly k processes acting as 
senders, and (ii) a k-mazximal transition, which can fire if the number m of pro- 
cesses that are ready to act as senders is at least 1, and is fired with min(m, k) 
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processes acting as senders. Additionally, each transition can be equipped with 
a global guard that identifies a subset of the local state space. Then, a transition 
is enabled whenever it can fire and the local states of all processes are in the set 
identified by the transition guard. 

We formalize these notions in the following, starting with the case without 
transition guards. 


2.1 Global Synchronization Without Guards 


Unguarded Processes. An unguarded process is a labeled transition system 
P = (A, S,so, T}, where A is a set of local actions, S is a finite set of states, 
so € S is the initial state, and T C S x Ax S is the transition relation. A is 
based on a set A of global actions, where each a € A has an arity k > 1 and is 
either a k-sender action or a k-maximal action. For every global action a € A 
with arity k, A contains local actions a,!!,...,a,!!,a??. Actions a,!!,...,a,!! are 
called sending actions and a?? is called a receiving action. 

A local transition from state s to state s’ on sending action a € A denoted 
s Š s' is called a sending transition (resp., receiving transition) if a is a sending 
action (resp., receiving action). We assume that receives are deterministic: for 
each state s and each receiving action a??, there is exactly one state s’ with 
g2, s', and that sends are unique: for each sending action a; there is exactly 
one pair of states s,s’ with s L gl 
Example 1. If we ignore guards on transitions, the process in Fig.1 is an 


unguarded process. Global action Choose has arity 2, and local sending tran- 


oe Choose;!! 
sitions PICK ————> 


Choose?? Ei ae 
Pick —2°*"", IDLE, and all other receiving transitions on Choose are self- 


loops (not depicted). 


REPORT for i € {1,2}. One local receiving transition is 


Unguarded Systems. Given an unguarded process P = (A, S, so, T}, we con- 
sider systems composed of n identical processes, and use a counter abstraction 
to efficiently represent global states, without loss of precision [25].? 

That is, the parameterized global transition system is defined as M(n) = 
(A, Q, qo, >), where Q = {0,...,n}5, i.e., a global state is a function q : S > 
{0,...,n}. Assuming a fixed order on S, we will also use q as a vector of natural 
numbers. The initial state qo is the state with qo(so) = n and qo(s) = 0 for all 
s # Sq. Finally, we define the global transition relation —, separated into the 
two different types of actions: 


1 Processes that do not satisfy the assumptions can easily be rewritten to satisfy them, 
e.g. by adding self-loops on any missing receive actions, and by renaming the actions 
of duplicate sending transitions (and adding corresponding receiving transitions). 

? For presentation clarity, we do not explicitly consider an environment process in our 
model. All of our results extend to the case with an explicit environment process; 
see the extended version [38] for a justification. 
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k-sender Actions. A k-sender action a € A with local sending transitions s; =~ 
si, for i € {1,..., k} can be fired from a global state q if there are k processes 
that can take these local transitions. Upon firing the action, each of the local 
transitions on actions a;!! is taken by exactly one process, and all other processes 
take a transition on action a?? to arrive in the new global state q’. Formally, we 
assign to each k-sender action a € A (i) a vector Va E€ Q containing the number 


of expected senders for each state t € S: va(t) = |{s aS, gf | s = t}|, (ii) a vector 
v/, containing the number of senders that will be in each state t € S after the 
transition: v’, (t) = {s “> s' | s’ = t}, and (iii) a function Ma : Sx S —> {0,1}, 
where M,(s,s’) = 1 if there is a local transition s 2", s!, and M,(s, 8’) = 0 
otherwise. We also use Ma as a |S| x |S| matrix, called the synchronization matrix 
of action a. 

Then, a transition from global state q on action a is possible if q(s;) > Valsi) 
for alli € {1,...,&}, and the resulting global state can be computed as 


q' = Ma: (q — Va) + Vh, 


and we write q > q’. Intuitively, q’ is obtained from q by “removing” the senders 
from their local start states, moving all the remaining (receiving) processes to 
their respective local destination states, and then adding the senders to their 
appropriate local destination states. Note that this representation relies on the 
assumption that sends are unique and receives are deterministic, which also 
implies that each column of a synchronization matrix Ma is a unit vector. 


Example 2. Consider the process in Fig. 1. The synchronization matrix and vec- 
tors for action Smoke are shown below, with global states given in the order 
(ENV, ASK, IDLE, PIcK, REPORT) (and abbreviated as (E, A, I, P, R)). 


Notice, for instance, that the first column in Msmoke encodes the local receive 


eye Smoke?? F 
transition ENV ————> IDLE. The vector-pair Vgmoke aNd Vgmoke encode the 


1 
local send transition ASK Smoke. Pick. In particular, Vsmoke indicates that 


the sender starts in ASK and Vg, one indicates that the sender moves to PICK. 


Msmoke VSmoke VSmoke 
EAIPR 
E 00000 Ej 0 E| 0 
A | 00000 | nt 1 | Al 0 | 
I 10100 I| 0 I| 0 
P 01010 P| 0 P| 1 
RI0000 1 R| 0 R| 0 


Now, consider a global state (3, 2,0,0,0) with three processes in ENv and two in 
ASK. From this state, the transition (3, 2,0,0,0) „Smoke, (0,0,3,2, 0) is enabled 
(since there is at least 1 sender in ASK), where all three processes in ENV act as 
receivers to move to IDLE (according to the synchronization matrix Msmoke), 
one process in ASK acts as the sender to move to PICK, and the other process 
in ASK acts as a receiver, also moving to PICK. 
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k-mazimal Actions. A k-maximal action a € A with local sending transitions 
Si “a si for i € {1,...,k} can be fired from a global state q if there is at least 
one process that can take one of these local transitions. Upon firing the action, 
for each state s; with at least one local transition s; “> si, (i) if q(si) > valsi) 
then each of the local transitions s; an s| is taken by exactly one process, or, 
(ii) if q(s;) < va(s;) then a total of q(s;) of the local transitions s; oe si, are 
taken, each by exactly one process. All other processes take a transition on the 
receiving action a?? to arrive in the new global state q’. Formally, we again assign 
to each action a vectors Va, vj, and a synchronization matrix Ma, as above. If 
q(s;) > Va(s;) for all i € {1,..., k}, then these are used as defined above. For 
cases where this does not hold, we assign to the action an additional set of 
vector-pairs (ua, U} ) with different numbers of senders that actually participate, 
and q’ is computed based on a vector-pair with the maximal number of senders 
that is supported by q. 


Example 3. The synchronization matrix and vectors for action Choose are 
shown below. Note that, if Choose is a 2-maximal action, then the vector-pair 
(UChoose> UGhoose) IS used to model the case where only one sender is available 
to take the sending transition. 


Mchoose UChoose UChoose VChoose VChoose 
EAIPR 
E 10000 Ej 0 Ej 0 Ej 0 Ej 0 
A |0 10 0 0 Al 0 Al 0 A} 0 A] 0 
I 00110 I} 0 I} 0 I} 0 IT} 0 
P 000 0 0 P| 1 P| 0 P| 2 P| 0 
R|]|0 00 0 1 R| 0 R| 1 R| 0 R| 2 


Regardless of whether Choose is a 2-sender or a 2-maximal action, the 
global transition (0,0,1,4,0) -Soos (0,0,3,0,2) is possible. In a state q = 
(0,0,4,1,0), with 4 processes in IDLE and 1 in PICK, the Choose action will not 
be enabled if it is a 2-sender action because two sending processes are required 
(in PICK), but only one sender is available. However, if Choose is a 2-maximal 


action, then the global transition (0,0, 4, 1,0) onee, (0,0,4,0, 1) is possible. 


Runs, Reachability Properties. A run of system M(n) is a finite or infinite 
sequence of global states qoqi..., where qo is the initial state and q; > qi+1 
for all i. We say that a state q is reachable in M(n) if there is a run of M(n) 
that ends in q. For a fixed m € N and local state s € S, let ¢,,(s) be a property 
denoting the reachability of a global state q with q(s) > m. If such a state is 
reachable in M(n), we write M(n) H ¢m(s). 


Other Communication Primitives in the GSP Model. Note that most 
of the synchronization-based communication primitives from the literature are 
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instances of k-sender transitions or k-maximal transitions: broadcasts [28] are 
simply 1-sender transitions, internal transitions are 1-sender transitions with 
Ma = Id (the identity matrix), pairwise rendezvous transitions [32] are 2-sender 
transitions (denoting the sender and receiver of the rendezvous transition) with 
Ma = Id, asynchronous rendezvous transitions [16] are 2-maximal transitions 
with Ma = Id. Negotiations [27], i.e., a synchronous transition of all processes 
with no distinguished sender, can be modeled as a set of 1-sender transitions, 
where every local receiving transition s L, s is paired with a sending transition 


g“ s’, allowing an arbitrary process to act as the sender. In addition to these, 
GSPs allow us to express many other natural synchronization primitives, e.g., 
summarizing the election of (up to) k leaders in a single step. 

Finally, disjunctive guards [19], i.e., guards G C S that require that there 
exists a process that is in some state s € G, can be modeled by adding an 
auxiliary sending action ag!!, and transitions s oe M,(s) for every s € G, i.e., 
a process in some state s € G must exist to enable the transition, but apart 
from that this process acts like a receiver. Note that this works without adding 
a notion of guards to our model. 

In what follows, we extend our model to allow conjunctive guards, i.e., guards 
that require that all processes are in some subset of the local state space. 


2.2 Global Synchronization with Guards 


Guarded Processes. A guarded process is a tuple Pasp = (A, S, so, T}, where 
all components are as before, except that now we have T C S x Ax P(S) x S, 
i.e., transitions are additionally labeled with a subset of S, called a guard. A 
local transition from state s to state s’ on action a with guard G will be denoted 


s 2S, 5! We calla guard G non-trivial if G 4 S. Wlog, we assume that for any 
global action a, all local transitions based on a have the same guard. 


Guarded Systems. Let the support of a global state q be supp(q) = {s € S | 
q(s) > 0}, i.e., the set of local states that appear at least once in q. Then the 


semantics of a global transition on action a with guard G, denoted q ae, q’, is 
as defined before, except that the transition is enabled only if supp(q) C G. 


Example 4. Consider the global transitions introduced in Example 2, and recall 
that global states are given in the order (ENv, ASK, IDLE, PICK, REPORT). 
While the transition (0,0, 1, 4,0) Eiaa (1,0,0,4,0) would be possible in the 
unguarded model, the guard G3 = {REPORT, IDLE} on the Reset action dis- 
ables this transition, as supp((0,0,1,4,0)) = {PICK, IDLE} Z G3. Similarly, from 
q = (1,0,1,2,0), while a transition on action Choose is enabled for unguarded 
processes, the guard G2 = {PIcK, IDLE} on action Choose disables this transi- 
tion, since supp((1,0,1,2,0)) Z Go. 
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3 Parameterized Verification for GSPs Without Guards 


In this section, instead of the parameterized system M(n), we consider an 
infinite-state system M,, that includes the behaviors of M(n) for every n: it 
initializes to M(n) for arbitrary n € N, and then behaves according to the 
semantics of a GSP of that size. We are interested in reachability properties 
mlS), where Mæ H| ¢m(s) is equivalent to dn. M(n) FE mls), i.e., we are 
considering a parameterized reachability property over all instances of M. 

We use this slightly different model in order to make use of the notion of well- 
structured transition systems (WSTS), as defined by Finkel [30]: an infinite-state 
transition system that is equipped with a well-quasi-order (WQO) on its state 
space and has some additional properties. Finkel and Schnoebelen [31] have 
surveyed existing results on WSTSs and put them into a common framework. 

We will show that, for a suitable WQO, M is a WSTS, and that this 
enables parameterized verification for reachability properties ¢,,,(s). 


3.1 Compatibility and Effective Computability of Predecessors 


For the following definitions, fix an infinite set of states Q and a transition rela- 
tion —. Moreover, let < be a WQO on Q, i.e., a reflexive and transitive relation 
such that, for any infinite sequence qo, q1, q2,.-.. of states from Q, there exist 
indices 7 < j with q; < qj. In particular, < does not admit infinitely decreasing 
sequences or infinite anti-chains. 


Compatibility. We say that < is compatible with — if for every q,q’,p E€ Q 
with q < p and q — q’ there exists p’ € Q with q’ < p’ and p —* p’. If 
the property also holds after replacing p —* p’ with p —> p’, then we say < is 
strongly compatible with —. 


Well-Structured Transition System. A transition system (Q,—) equipped 
with a WQO that is compatible with — is called a well-structured transition 
system (WSTS). 


Upwards-Closed Sets. For a (possibly infinite) subset U C Q, the upwards 
closure of U is the set TU = {p € Q | 3q E€ U : q < p}. A set U is upwards 
closed if } U = U. Every upwards closed set U has a finite basis: a finite set 
BCU such that | B=U. 


Effectively Computable Predecessors. For U C Q, let Pred(U) denote the 
predecessor states of U with respect to —. We say that we can effectively compute 
Pred if there exists an algorithm that computes a finite basis of Pred(U) from 
any finite basis of any upwards-closed U C Q. 


Theorem 1 ([31]). In a WSTS with effectively computable Pred, reachability 
of any upwards-closed set is decidable. 
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3.2 Decidability for Unguarded GSPs 


We prove that any unguarded GSP is a WSTS with effectively computable Pred, 
which implies that reachability properties are decidable for GSPs. To this end, 
let < be the component-wise order on global state vectors q, p: 


qxp iff q(s) < p(s) for all s € S. 


Note that with respect to this WQO, the set of global states q with q(s) > m 
is an upwards-closed set, i.e., if we can decide reachability of upwards-closed sets, 
then we can decide reachability properties ¢m(s). Thus, decidability of checking 
Moo H mls) follows from the following theorem. 


Theorem 2. If Mæ is based on an unguarded GSP process, then Ma equipped 
with < is a WSTS and we can effectively compute Pred. 


Proof. To prove that Mæ is a WSTS, we show strong compatibility of tran- 
sitions w.r.t. <. We consider the following two cases separately: (i) k-sender 
transitions, and (ii) k-maximal transitions. 

(i) For k-sender transitions, let q < p and q & q’ for some k-sender action 
a. Then q’ = Ma- (q— Va) +v, for some synchronization matrix Ma and vectors 
Va, v, associated with action a. First observe that since q < p, there is also a 
transition p = p’ = Ma- (p — va) + v}. Moreover, we have M,-q < M,-p, and 
therefore Ma (q — Va) + v4 x Ma- (P — Va) + vh, ie., q’ < p’. 

(ii) For k-maximal transitions, consider again q < p and q “ q’, where now a 
is a k-maximal action. Then q' = Ma-(q—Ua,q)+U4,q for some vectors Ua,q; Ug,q 
with J ses Uaq(5) = J ses Ua,q(S) < k. Again, first observe that since q < p, a 
transition p = p’ is enabled, where p’ = Ma - (p — Ua,p) + Ug, p and ua,p(5) > 
Ua.q(S); Ug p(s) = Ua, q(S) for all s € S. Note that, for any s € S, we can have 
Ua p(s) > Ua,als) only if q(s) — ugq(s) < 0 and p(s) > q(s). Furthermore, 
Ug,p(S) — Ua.q(s) < p(s) — q(s). Therefore, we get q — Ua,q X P — Usp, which 
implies Ma - (q — Uaq) X Ma: (P — Uap), and thus Ma : (q — Wag) + Ug Z 
Ma: (P — Uap) + UG p» ie., q < p’. 

Next, we prove that we can effectively compute the basis of Pred(C), where 
Pred(C) is the set of states from which a transition exists to a state in an 
upwards-closed set C, as follows: 

(i) For a k-sender transition based on action a, any predecessor q in Pred(C) 
must satisfy (i) va < q, and (ii) Ma -(q— Va) + vi, = q’, for some q’ € C. The 
basis of Pred(C) consists of the minimal elements (w.r.t. <) that satisfy these 
conditions, and thus is computable. 

(ii) For k-maximal transitions, the proof works in the same way, except that 
now we may have multiple possibilities of what a minimal predecessor could be, 
based on different subsets of the senders being present or not. Since this is always 
a finite case distinction, effective computability of Pred is still guaranteed. 
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4 Parameterized Verification for GSPs with Guards 


For GSPs with guards, compatibility under < in general does not hold, since for 
q x p, a transition on action a that is enabled in q may not be enabled in p. 
Furthermore, note that even strong restrictions on processes are unlikely to yield 
compatibility with respect to <, since whenever supp(q) C G for a non-trivial 
G, one can always find a p with q < p and supp(p) É G, disabling the action. 

Therefore, we introduce a refined WQO, denoted <, that is based on the 
semantics of guards, as well as sufficient conditions on the guarded process P, 
such that the system Mə isa WSTS and we can effectively compute Pred. 

Let G be the set of guards that appear on transitions in P, and recall that 
supp(q) = {s € S | q(s) > 0}. Then we consider the following WQO?: 


q <p iff (qxpAVGeG: (supp(q) CG ==> supp(p) CG)). 


Intuitively, a global state p is considered greater than a global state q if p 
has at least as many processes as q in any given state, and for every transition 
q = q’ that is enabled in q, a transition on action a is also enabled in p. 

We will see that compatibility with respect to < can only be ensured under 
additional conditions, as formalized in the following. 


4.1 Guard-Compatibility and Well-Behaved Processes 


Strong Guard-Compatibility for k-Sender Actions. For a k-sender action 


WW 
a with local sending transitions s; a. s! for i € {1,..., k}, let § be the set of 
all states s;, 8’ the set of states si, and Ma the synchronization matrix. We say 
that action a is strongly guard-compatible if the following holds for all G’ € G: 


CG’ => Ys € G: Mals) E€ C (C1) 


Intuitively, if all senders move into a guard G”, then also all receivers need 
to move into G’. This ensures that if G” is satisfied after the transition in a 
system of a given size, then it is satisfied after that transition in a system of any 
bigger size, because any additional receivers must also move into G”. Note that 
Condition (C1) always holds for trivial guards. 


Strong Guard-Compatibility for k-Maximal Actions. For a k-maximal 
action a, the idea of the condition is the same as before, but it must be extended 


3 We show that < is a WQO by proving that every infinite sequence of global states 
qı, q2,... contains qi, q; with 7 < j and q; < q;. To this end, consider an arbitrary 
infinite sequence q = qi, q2,.... Then there is at least one set S of local states such 
that infinitely many q; have supp(qi) = S. Let q’ be the infinite subsequence of q 
where all elements have supp(q;) = S. Since < is a WQO, there exist q;, q} with 
i < j and q; < qj, and since supp(q;) = supp(q;) = S, we also get q; < qj. Since 
q; = qk and q; = q; for some k < l, we get qx < q for k < L, and thus < is a WQO. 
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to allow different subsets of the potential senders to act as actual senders in a 
given transition with action a. A simple approximation is that all senders must 
agree, for every G € G, on whether they enter G or not. 

In the following, we formalize a notion that takes into account that transitions 
that only use a subset of the potential senders are only possible from certain 
global states, and that global states with different sets of actual senders may be 
incomparable with respect to <, and therefore unproblematic for compatibility. 

We write t< s if, for all guards GEG, s € G = t € G. Similarly, we write 
t< H for a set of states H if, for all guards GEG, HC GSteG. 

Consider a k-maximal action a with local transitions s; a, s; for i € 
{1,..., k} and synchronization matrix Ma. Let R = G \ {s1,..., Sk} and let G’ 
be the set of all guards Gr € G such that R C Gpr. 


Then we say the action a is strongly guard-compatible if both of the following 
hold for all G’ € G: 


VV se | = (Ys € R: Mals) €G’) (C2.1) 
1<i<k 
\ ((si <8; As € G’) = (s; € GA Malsi) € G’)) (C2.2) 


Intuitively, if one potential sender moves from a state s; into a guard G’, then 
every receiver from R must do the same, so that G” will be satisfied regardless 
of the number of receivers. This is also required for other senders and receivers 
from a state s; ¢ R, unless there exists a guard that is satisfied if s; is occupied, 
but not if s; is occupied, since that means that a global state where only s; is 
occupied is incomparable (w.r.t. <) to a state where also s; is occupied, and 
therefore we do not care about compatibility of the transitions. 

Note that for k = 1, the first condition (C2.1) instantiates to condition (C1) 
and the second condition (C2.2) is an empty conjunction, i.e., vacuously satisfied. 
This is to be expected, since semantically there is no difference between a 1- 
sender action and a 1-maximal action. 


Example 5. We can see that actions Smoke, Choose, and Reset from our 
motivating example in Fig. 1 are strongly guard-compatible: 


Smoke!!,{Env, Ask} 


— Smoke is a 1-sender action with sending transition ASK 
Pick. The state PICK is only included in one non-trivial guard G2 = {PIcK, 
IDLE}. Since receiving transitions from {ENv, Ask} end in {Pick, IDLE} C 
G2, condition (C1) holds, so Smoke is strongly guard-compatible. 


Choose;!!,{ Pick, IDLE} 
> 


— Consider Choose with sending transitions PICK 
REPORT for i € {1,2} as a 2-sender action. REPORT is only included in 
one non-trivial guard G3; = {REPoRT, IDLE}. Since the receiving transition 
from {Prick} ends in IDLE € G3 as well, (C1) holds, so Choose is strongly 
guard-compatible. 
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— Consider Choose as a 2-maximal action. Again, REPORT is only included in 
one non-trivial guard G3 = {REPORT, IDLE}. Since all senders and receivers 
start from PICK and end up in a state in G3, conditions (C2.1) and (C2.2) 
hold and Choose is, again, strongly guard-compatible. 

— Reset is a negotiation action. Recall that negotiations are modeled as a set 
of 1-sender actions, allowing for an arbitrary sender. Therefore, each of these 
broadcasts must satisfy (C1) for the negotiation to be guard-compatible. 
Reset is indeed strongly guard-compatible because all of its sending and 
receiving transitions end in ENV, meaning that when the action fires, all pro- 
cesses will move into a single state, ensuring that all guards will be uniformly 
enabled or disabled, regardless of the number of processes, which of them is 
the sender, or whether they begin in REPORT or IDLE. 


— Finally, as stated in Sect.2.1, the internal transition ENv £, Ask can be 


modeled by a 1-sender action, say a, with a send transition ENV ae, ASK 
and self-loop receive transitions on all states. The sender ends up in one non- 
trivial guard G1 = {ENv, ASK}. Since receiving transitions from {ENv, Ask} 
end in {Env, Ask} C G4, condition (C1) holds, so a is strongly guard- 
compatible. 


Refinement: Weak Guard-Compatibility. To support a larger class of sys- 
tems, we show how one can relax the previous conditions, at the cost of making 
them more complex. The idea is that, instead of requiring that if the sender 
ends up in a guard then the receivers immediately end up in that guard after 
the transition, it is enough if the receivers have a path to a state in that guard. 
To avoid unnecessary complexity, we only consider paths of internal transitions. 

If there exists a path of unguarded internal transitions from s to s’, we write 
s ~ s'. Then, condition (C1) can be relaxed to 


8 CG’ Vs EG: (Mals) € G@ VAs’ € S: (s' 48 A Mals) ~ 8')). (Clw) 


Actions that satisfy condition (Clw) are called weakly guard-compatible. 


Remark. In a similar way, we can relax conditions (C2.1) and (C2.2). Further- 
more, the path ~ of internal transitions can be guarded, as long as the guards 
are sufficiently general to guarantee that these transitions can be taken. We refer 
the interested reader to the extended version [38] for more details. 


Well-Behavedness. Based on guard-compatibility, we can now define the class 
of processes that will allow us to retain decidability of reachability properties in 
the parameterized system: We say that a process P is well-behaved if every 
action is (weakly) guard-compatible. 

Note that unguarded processes are trivially well-behaved. 


Example 6. Observing that all actions in the process depicted in Fig.1 are 
(strongly) guard-compatible, it is clear that the process is well-behaved. 
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Well-Behaved Systems in the Literature. We want to point out that many 
systems studied in the literature are naturally well-behaved. 

For example, Emerson and Kahlon [20] introduce a model for cache coherence 
protocols that is based on broadcast communication and guards. They show that 
many textbook protocols can be modeled under the following restrictions: (i) 
every state is assumed to have an unguarded internal transition to the initial 
state INIT, and (ii) the only conjunctive guard is {INIT}. Clearly, every action 
in a process that satisfies these conditions will also satisfy condition (Clw), and 
therefore well-behaved systems subsume and significantly generalize the types of 
protocols considered by Emerson and Kahlon. 

Moreover, there has recently been much research on the verification of round- 
based distributed systems [14,34,37], where processes can move independently 
to some extent, with the restriction that transitions between rounds can only 
be done synchronously for all processes. When abstracting from certain features 
(e.g. fault-tolerance and process IDs), our model is well-suited to express such 
systems: guards can be used to restrict transitions to happen only in a certain 
round, and can furthermore model the “border” of a round that needs to be 
reached by all processes, such that they can jointly move to the next round. 

Our example from Fig. 1 can also be seen as a round-based system: the first 
round includes states ENv, ASK, and upon taking the transition on Smoke, 
all processes move to the second round, which includes states Pick, IDLE. 
From there, on action Choose the system moves to the third round, which 
includes states REPORT, IDLE, and on action Reset back to the first round. 
Note that the states in different rounds are exactly the guards that are used in 
the transitions—or seen the other way around, guards induce a set of rounds 
on the local state space, and the guard-compatibility conditions ensure that 
processes move between these rounds in a systematic way. 

While the rounds are very simple in this example, the technique is much 
more general and can be used to express many round-based systems, including 
those described in Sect. 6. 


4.2 Decidability for Well-Behaved Guarded Processes 


Based on the notion of well-behavedness, we can now obtain a decidability result 
that works in the presence of guards. The following theorem implies that param- 
eterized verification for properties ¢,,(s) is decidable for well-behaved processes. 


Theorem 3. If Mœ is based on a well-behaved GSP process, then M is a 
WSTS and we can effectively compute Pred. 


Proof. To prove that Mæ is a WSTS, we show compatibility of transitions w.r.t. 
<, i.e., if q < p and q —> q’, then 4p’ with q’ < p’ and p —* p’. We consider 
two cases: (i) k-sender transitions, and (ii) k-maximal transitions. 


(i) Suppose a is a k-sender action. Let q ae, q’ be a transition and q < p. 


Since q < p implies that supp(p) C G, we know that transition p a p’ 
is possible, and by the proof of Theorem2 we know that q’ < p’. To prove 
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compatibility with respect to <, it remains to show that VG’ € G : (supp(q’) C 
G’ = supp(p’) C G’). 

First assume that condition (C1) holds. Then, let G” € G be an arbitrary 
guard. By (C1), we either have § Z G’, in which case the desired condition is 
satisfied for G”, or we have that Vs € G: M,(s) € G’, i.e., all potential receivers 
move into G”. Thus, we get supp(q’) C G” iff supp(p’) C G”, satisfying the desired 
condition. 

If instead of (C1) the action satisfies (Clw), the argument is the same, except 
that if necessary we use the internal transitions that are guaranteed to exist by 
the condition to arrive in a state p’ with q’ < p’. 

(ii) Suppose a is a k-maximal action with local transitions s; <_ si. for 
i € {1,...,k} and synchronization matrix Ma. By the proof of Theorem 2 we 
know that there exists a transition p ae, p’ with q’ < p’, and it remains to 
show that VG’ € G : (supp(q’) C G’ <= > supp(p’) C C’). 

Let G’ € G be an arbitrary guard, and assume the action is strongly guard- 
compatible. By condition (C2.1) we know that if there is a single local sending 
transition with s; € G”, then all receivers will move into G”. So first suppose 
there is no such local transition: then G” cannot be satisfied in q’ (since at least 
one sender must be present), and the desired property holds. Inversely, suppose 
there is such a local transition: then all processes that start in R will be mapped 
into G”, so G” will be satisfied iff all remaining processes are mapped into G”. 


Now, suppose that all local transitions taken in q ae, q’ are such that si € G” 
(for otherwise q’ does not satisfy G”). Since q < p, there exists a transition 


p sa p’ such that the set of local transitions that are fired in q LER q' isa 
subset of the local transitions that are fired in p ae, p’. If all sending transitions 


taken in p 29, p’ are also such that s; € G’, then by conditions (C2.1) and 
(C2.2) the same will hold for all receiving transitions from p, and therefore, 


3 ipi i!l ,G š 
supp(p’) C G’. Thus, suppose there is a local transition s; pal cia s; that is taken 
f iG : iG NG : 
in p — p’, but not in q > q’, and s; ¢ G'. Let sj a si, be an arbitrary 


local transition that is taken in q 2 q’. Then by condition (C2.2), either there 
must be a guard G” € G’ with s; ¢ G” As; € G”, contradicting the assumption 
that q < p, or we have s} € G’ = s; € G’ ^ Malsi) € G’, contradicting the 
assumption that s; ¢ G”. 

Again, if the action is weakly guard-compatible, the argument can be 
extended by using the paths of internal transitions, if necessary. 

Effective computability of Pred follows from the proof of Theorem 2— 
the only difference is that we must consider the guards, i.e., a predeces- 
sor is only valid if it additionally satisfies the guard of the transition under 
consideration. 
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5 Cutoffs for GSPs 


We investigate cutoff results for GSPs and their connection to the decidability 
results in Theorem 2 and 3. While the proofs of these theorems yield a decision 


procedure for parameterized verification 


, a cutoff result is more versatile as it 


reduces parameterized verification to a problem over a fixed number of processes, 


and under certain conditions can also be 


5.1 Definition and Basic Observat 


A cutoff for a class of processes IT and a 
such that for every P € IT and gE Ẹ, 


used for parameterized synthesis [39]. 


ions 


class of properties ® is a number c € N 


Mea - ¢¢ 


M(c) E ¢ 


We show how to obtain cutoffs for well-behaved GSPs that satisfy additional 
conditions, and for reachability properties of the form ¢,,,(s), based on obser- 
vations from the proof of Theorem 2. While for any given parametrized system 


and any safety property a cutoff exists 


45], a general cutoff, even if it can be 


computed, may be too large to be of practical value: it has been shown that for 
broadcast protocols the time complexity of checking reachability is non-primitive 


recursive in the size of the processes [51 


, and from the proof one can conclude 


that the same must hold for the size of cutoffs. 


Fig. 2. Example witnessing quadratic cutoff 


. Not depicted are additional sending tran- 


sitions on a!! from every state in the outer cycle to s1. 
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Example: Quadratic Cutoffs. Consider the (unguarded) process in Fig. 2. We 
are interested in a lower bound on the cutoff for this process, with respect to 
¢1(Sp), i.e., reachability of sp by at least one process. Note that to reach sp, 
we need at least one process in sg and one in ss at the same time. From the 
initial state so, the only possible action is 7, sending one process to sg in the 
inner cycle and all other processes to sı in the outer cycle. Then, the only way 
to make progress is action a, moving the process in the inner cycle to s7, the 
sending process from sı to s, (sending transitions on a!! are not depicted in 
Fig. 2), and all other processes to sg. After three further transitions on a, the 
outer processes are in s5, where the sending transition on b!! could be fired, but 
the process in the inner cycle is in s7, so additional transitions on a are required. 
Only after two additional rounds around the outer cycle we arrive in a state 
where both s5 and sg are occupied, and we can take the final transition on b 
that takes one process into sg. To arrive there, we took 16 transitions (one on 
i, 14 on a, and one on b), and by construction every process can only take one 
sending transition in a run. Thus, we need a system with at least 16 processes to 
have one of them reach sg, and no smaller number can be a cutoff for ¢1 (sz). 

To see that cutoffs grow at least quadratically, note that in similar examples 
where the inner and outer cycles consist of pı and po states, respectively, and pı 
and pə are relatively prime, then we need pı - p2 + 1 processes to reach sp. 


5.2 Conditions for Small Cutoffs 


We introduce sufficient conditions on processes that allow us to obtain small 
cutoffs. These conditions are inspired by our intended applications (see Sect. 6), 
and based on insights from the decision procedure in the proof of Theorem 2 and 
the example above. We observe that any q € Pred(C) that reaches a state q’ € C 
through a k-sender action a must satisfy (i) va < q, and (ii) Ma:-(q—va)+vi, = 
q’. Thus, if there is q’ € C such that =(v, < q), we need to consider a predecessor 
q with |q| > |q’|. It is easy to see that this can only happen if q’ contains 
processes in states that can be reached through a only through either a receiving 
transition, or a sending transition if k > 1. Thus, we want to avoid that states 
we are interested in are only reachable through such transitions. 

We restrict our attention to specifications ¢,,(s) and to cases where we can 
identify conditions on a GSP process P such that the cutoff for such specifications 
is c = m. If this is the case, then we say that reachability of s is synchronization- 
independent in P, and that the pair (P, ¢,(s)) is cutoff-amenable. 

We begin with a simple case, where systems are restricted to only internal 
transitions and negotiations (we defined in Sect. 2.1 how these are expressed in 
terms of 1-sender transitions). 


Lemma 1. Let P = (A,S,50,T) be a well-behaved GSP process such that 
all transitions are internal transitions or negotiations. Then reachability of s is 
synchronization-independent in P for every s € S. 


Proof. To see this, first consider a system with n > m processes, where eventually 
m of them reach s. We can simulate this run in a system with m processes by 
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simply keeping the m processes that reach s, and removing all others. Similarly, 
if all processes in a system of size m eventually reach s, then we can simulate this 
run in a bigger system by adding processes that “follow” the internal transitions 
of the other processes such that always the same guards as in the original run 
will be satisfied. Well-behavedness ensures that this is always possible. 


While we are in general not interested in systems that only communicate 
through internal transitions and negotiations, we can refine this observation 
based on the states we are interested in, and allow other types of communication. 

To this end, define a transition of a process P to be free if it is (i) an internal 
transition, (ii) a sending transition of either a broadcast (i.e., a 1-sender action) 


: . E ve ae a??,G 
or a k-maximal action, or (iii) a receiving transition s —= s’ of a broadcast 


with matching sending transition s SESA s'. Note that the latter includes nego- 
tiation transitions. A path from one state to another is free if all transitions on 
the path are free. The idea is that free transitions and paths are only restricted 
by guards (i.e., the absence of processes in certain states), but not by the exis- 
tence of other processes in certain states (as, e.g., a 2-sender transition would 
be, since a sender depends on the presence of another sender to be able to fire 
the global transition and move along its own local transition). 


Lemma 2. Let P = (A, S, so, T) be a well-behaved GSP process, and s € S 
such that all paths from so to s in P are free. Then reachability of s is 
synchronization-independent in P. 


Proof. The argument follows the same line as the one above for protocols with 
only internal transitions and negotiations, since the same transitions for existing 
processes are also possible if we can ensure that the same guards can be satisfied 
in the bigger system. Well-behavedness ensures that there is a run in the bigger 
system where the same guards are satisfied. 


We require that all paths be free, since existence of a free path is not sufficient 
in general: if m > 1, then the first process that moves along that free path 
may force other processes to leave it (e.g., by taking a sending transition of a 
broadcast). However, this condition is still slightly restrictive, and can be relaxed. 

Define a simple path as a path with no repeated states. We show that under 
additional conditions, it is enough to consider restrictions that are based on 
paths that are simple and free: 


Lemma 3. Let P = (A, S, so, T) be a well-behaved GSP process, s € S, and 
let F be the set of simple free paths from so to s. If for each send transition: 


1. the transition does not appear in paths in F and the corresponding receiving 
e ??, Ga ; 
transitions ss ———> sq with s, € p for some p E€ F have sqa = 8s, or, 
2. the transition appears in paths in F and the following holds for every corre- 
: P A ??,Ga 
sponding receive transition Ss os Sq where ss € p for some p E€ F and 
sa ¢ p for any p € F: either (a) there exists an internal transition s, > s', 
with s', E€ p for some p E€ F, or (b) all paths out of sq lead back to a state sf 
in a path in F and are free between sq and sp. 


316 N. Jaber et al. 


then reachability of s is synchronization-independent in P. 


Proof. First consider a run of a system that satisfies the above conditions, and 
has n > m processes, where eventually m of them reach s. We can simulate 
this run in a system with m processes by keeping the m processes that reach s, 
and removing all others. Note that the sending transitions are on the same free 
simple path from which processes can diverge using the corresponding receiving 
or sending transitions, or they do not affect them at all. Hence, at least one of 
the senders is guaranteed to reach s. All other senders and receivers may diverge 
from a simple free path but are guaranteed a free path back to a state along a 
free path and hence, can reach s freely. 

Now assume that all processes in a system of size m eventually reach s, then 
we can simulate this run in a bigger system by adding processes (that behave in 
the same way as an existing process). Note that, since any transition diverging 
from a free simple path can only be triggered by a sending transition on that 
same free path, it is impossible to add a sender that can make processes diverge 
and then not reach s after. 


Example 7. In this example we show how Lemma 3 applies to the example in 

Fig. 1. Here sp is the Env state, s is the REPORT state, and the value of m is 3 

(since the safety specification is: no more than 2 detectors can report the fire). 
The set of simple free paths F is: 


Smoke!! Ch i! . 

Env — Ask 20, Pick —“°"". REPORT for i € {1,2}, and 
Smoke?? Choose;!! 

Env — Ask ———> Pick —"""" Report for i € {1,2}. 


It is clear that all the sending transitions Smoke!!, Choose !!, Chooseg!! 
appear only in F. Furthermore, the corresponding broadcast-receive transitions 
satisfy the required conditions as follows: 


Smoke?? 


— the transition ENV IDLE satisfies condition (2a) because the internal 


transition ENV —> ASK exists in a path in F. 


ise Choose?? P ee: : 
— the transition PIck ————> IDLE satisfies condition (2b) since all paths 
out of IDLE are free (namely, the negotiation transition IDLE oee ENV) 


and lead back to a path in F. 


Since Lemma 3 holds, the reachability of s is synchronization-independent 
and the cutoff is 3. 


Checking the Cutoff Conditions. Note that while the conditions in Lemma 3 
seem complex, all our cutoff conditions can be checked on the process definition 
in polynomial time, making them well-suited for fully automatic verification. 
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6 Applications and Evaluation 


To evaluate our approach, we consider several distributed applications that use 
agreement protocols like consensus or leader election, and that can be modeled 
as well-behaved systems that satisfy one of our cutoff lemmas: 


— Chubby [11]: A distributed lock service for coarse-grained synchronization 
with an elected leader node that handles client messages. 

— Distributed Smoke Detector (SD): A sensor network application that elects a 
subset of processes, who have detected smoke, to report to the authorities. 

— Smoke Detector with Reset (SDR): A variant of SD that uses a “reset” signal 
to resume monitoring for smoke, thereby requiring infinite rounds of agree- 
ment. (this was our motivating example in Fig. 1) 

— Distributed Mobile Robotics (DMR): Based on an existing benchmark [18], 
where a set of robots successively coordinate to create a motion plan. 

— Distributed Key-Value Store (KVS) modeling a key-value store á la Redis [48]. 

— Small Aircraft Transportation System (SATS): The landing protocol of SATS 
proposed by NASA [53]. SATS aims to increase access to small airports with- 
out control towers by allowing aircrafts to coordinate with each other to 
operate safely upon entering the airport airspace. 

— SATSt+: A variant of the SATS protocol where all processes communicate 
explicitly to determine subsets of aircrafts to coordinate the landing with. 


In addition, we provide an experimental evaluation, based on related 
work [37] in which a new model—the CHOOSE model—that can be seen as a 
refinement of GSP, is proposed. The CHOOSE model extends a standard model 
of distributed systems [2,3] with a primitive that abstracts various types of 
distributed agreement protocols. The work further defines a mapping from the 
CHOOSE model to GSP that establishes a simulation equivalence between the 
two models, enabling interchange of safety verification and cutoff results between 
the two models. 


Table 1. Performance of parameterized verification based on our cutoffs. 


Benchmark States | Cutoff | Verification 
time(s) 
Chubby 9 2 0.12 
SD 5 3 0.28 
SDR 5 3 0.13 
DMR 8 3 0.16 
KVS 18 3 3.06 
SATS 24 5 3.83 
SATST+ 26 5 17.1 
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To make use of the ease of encoding the 1024 Fcnubby a T i J 
above benchmarks in the CHOOSE model Rees 
a è ` 256 - J 
and the ease of verification in the CHOOSE DMR —=— 


= KVS —e— 
model using off-the-shelf model checkers, @ 647 sats —— 


we illustrate the effect of our cutoff results = 
on efficiency of verification in the CHOOSE 
model. For the benchmarks given above, 
Fig. 3 depicts the verification time as a func- 
tion of the number of processes. Observe 0.25 + 
that verification time grows roughly expo- 


Verification Time ( 
(log scale) 


nentially with the number of processes. i ! l ! 
Moreover, verification for all the bench- j i : i j 

š # of Processes 
marks timed out beyond 9 processes, for a 
timeout of 30 min. In contrast, in Table 1 all Fig. 3. Verification time as a func- 
benchmarks have a cutoff of less than 6, and tion of the number of processes. 
reasonable verification times. 


7 Related Work 


Bodies of work that aim at automatically solving the parameterized verification 
problem (which is undecidable in the most general case [23,54]) take a large vari- 
ety of different approaches [1,10,13,33,35,41,43,47,56], in most cases without 
a focus on decidability. In the following we consider the approaches that target 
decidability, with models closely related to our GSP model. 


Models with Broadcasts and/or Global Guards. We want to enable rea- 
soning about distributed systems, abstracting complex building blocks like agree- 
ment protocols by primitives that satisfy assume-guarantee specifications. To 
support parameterized reasoning for systems with such abstractions, one needs 
a model with (i) conjunctive guards to model the assumptions, and (ii) forms 
of synchronization that are sufficiently general to model the guarantees of those 
building blocks, i.e., generalizations of broadcast communication. 

Esparza et al. [28] present a decidability result for safety properties of broad- 
cast protocols, but without global guards. Their result is also based on a reduc- 
tion to WSTSs, but we showed that the WQO presented in their work (corre- 
sponding to the WQO < in Sect. 3.2) is not suitable for systems with guards. We 
note that our GSP model subsumes the model of Esparza et al., and that our 
cutoff results also apply to their model (which had no previous cutoff results). 

Other existing models either are not sufficiently general [19,20,22], or support 
a combination of broadcasts and conjunctive guards without restrictions [21], 
which makes safety undecidable. This highlights the significance of our result: 
we manage to find a model with conjunctive guards and global synchronization 
such that safety remains decidable. 
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Other Decidable Classes. One way to obtain decidability is to restrict the 
generality of the parameterized verification problem in various ways. Most results 
in this direction consider a fully connected network (a clique), either with ren- 
dezvous communication [5,32], local updates with global guards [6,19], or vari- 
ants of these [16]. Some communication primitives have also been considered in 
more complex networks, for example token passing [4, 12,24], or broadcasts [17]. 
Decidability results for systems that are composed of identical components have 
recently been surveyed by Bloem et al. [9] as well as Espazra et al. [26]. Sev- 
eral bodies of work attempt to identify cutoff bounds for different classes of 
distributed systems. For example, cutoffs have been obtained for cache coher- 
ence protocols [20], guarded protocols [19,21,40], consensus protocols [44], and 
self-stabilizing systems [8]. None of these approaches are sufficiently general to 
tackle the types of distributed applications we address. 


Petri Nets and Vector Addition Systems. Also closely related to the param- 
eterized verification problems we consider is the body of work on Petri nets and 
vector addition systems, surveyed e.g. by Esparza and Nielsen [29] or Reisig [49]. 
While some types of communication can faithfully be expressed in these systems, 
global synchronization in general cannot. 


8 Conclusion 


We introduced global synchronization protocols (GSP), a system model that 
generalizes many existing models supporting global synchronization such as 
broadcast synchronization, pairwise rendezvous, and asynchronous rendezvous. 
We identified sufficient conditions, summarized under our notion of well- 
behavedness, that ensure decidability of the parameterized verification problem 
even in the presence of global (conjunctive) transition guards. Finally, we inves- 
tigated cutoffs for parameterized verification, and identified sufficient conditions 
under which small cutoffs exist. 

In ongoing work, we are focusing on extensions of our cutoff results as well as a 
dedicated implementation of our decision procedure. In the near future, we plan 
to investigate sufficient conditions that enable support for the parameterized 
verification of liveness properties for GSPs, and intend to develop a domain- 
specific language for writing GSPs that are well-behaved by construction. 
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Abstract. Replication is a common technique to build reliable and scal- 
able systems. Traditional strong consistency maintains the same total 
order of operations across replicas. This total order is the source of mul- 
tiple desirable consistency properties: integrity, convergence and recency. 
However, maintaining the total order has proven to inhibit availability 
and performance. Weaker notions exhibit responsiveness and scalability; 
however, they forfeit the total order and hence its favorable properties. 
This project revives these properties with as little coordination as pos- 
sible. It presents a tool called HAMPA that given a sequential object 
with the declaration of its integrity and recency requirements, automat- 
ically synthesizes a correct-by-construction replicated object that simul- 
taneously guarantees the three properties. It features a relational object 
specification language and a syntax-directed analysis that infers optimum 
staleness bounds. Further, it defines coordination-avoidance conditions 
and the operational semantics of replicated systems that provably guar- 
antees the three properties. It characterizes the computational power and 
presents a protocol for recency-aware objects. HAMPA uses automatic 
solvers statically and embeds them in the runtime to dynamically decide 
the validity of coordination-avoidance conditions. The experiments show 
that recency-aware objects reduce coordination and response time. 


1 Introduction 


Replicated objects [12,13,23,32,45] are pervasively used for fault-tolerance, 
availability, responsiveness and scalability. They are used in diverse application 
areas [14, 20-22, 37,39,40,50,53] including embedded controllers, online services 
and game engines. However, coordinating the replicas has proven to be chal- 
lenging. Strongly consistent replication, provided by consensus protocols such 
as Viewstamp [42], Paxos [34] and Raft [44], guarantees the same total order of 
operations across replicas. The total order simultaneously provides a hoard of 
favorable properties: integrity, convergence and recency. Replicas converge to the 
same state as the result of the same sequence of operations. Further, a propa- 
gated operation executes in the same state as the originating replica. Therefore, 
if an operation preserves the integrity properties [8] at the originating replica, it 
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will certainly preserve them in the other replicas as well. In addition, the lock- 
step execution keeps the replicas recent: an operations executes in all replicas 
before the next. Thus, replicas can be stale by at most one operation. 

However, strong consistency may not be available and responsive during 
network failures or offline use. Further, its scalability is limited. The trade- 
off between strong consistency of replicated objects, and their availability and 
responsiveness is a famous dilemma [1,3,26—28]. Therefore, system designers 
opted for weaker notions of consistency such as eventual [4, 15, 17,19, 24, 25,48, 52] 
and causal [2, 13,33] consistency that can provide availability, responsiveness and 
scalability but lose the same total order of operations. Several projects [16,49,51] 
provide programming interfaces for weak consistency notions. Unfortunately, the 
large collection of subtle weak consistency notions is unintuitive to users. If the 
chosen notion is too weak, it can affect correctness, and if it is too strong, it may 
degrade scalability. 

Therefore, researchers have recently provided high-level abstractions to shield 
the user from low-level complexities of weak consistency. These projects seem to 
be the steps towards reviving the same three pillars of consistency, i.e. integrity, 
convergence and recency, with as little coordination [7,35,47| as possible. CRDTs 
[48] revived convergence. If an object satisfies a few algebraic properties, its repli- 
cation can enjoy convergence even on top of eventual consistency. However, the 
replicas can experience states that violate the integrity properties. Therefore, 
follow-up projects revived the integrity property. CISE [29] and Soteria [41] 
present proof techniques to verify the integrity properties of a replicated object. 
Sieve [36], Indigo [10] and Hamsaz [30] translate the given high-level integrity 
properties to hybrid models. However, they are oblivious to state recency. The 
operations are eventually delivered to all replicas, however, they may be arbi- 
trarily delayed. Some updates may be delivered too late and expose the clients 
to stale data. On the other hand, at the expense of more communication, some 
updates may be immediately sent and delivered. However, applications may 
prefer to obtain more scalability and energy efficiency in return for bounded 
staleness. In fact, many applications such as ticketing, distributed sensors and 
network accounting can work with fairly recent data. Previous work such as 
TACT [55], TRAPP [43], FRACT [59], and PBS [9] considered staleness but did 
not address integrity and communication minimization. Further, they did not 
provide automatic analysis, decision and synthesis. In addition to convergence 
and integrity, this project, HAMPA, revives recency. Given a sequential object 
with the declaration of its integrity properties and recency requirements for its 
methods, it automatically synthesizes a correct-by-construction replicated object 
that guarantees integrity, convergence and recency while avoiding unnecessary 
coordination. 

To capture object specifications from the user, we present a relational lan- 
guage and its denotational semantics. The language provides a complete set of 
relational operators to define the object methods and integrity properties, and 
allows the user to declare recency requirements for the return value of each 
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method. Given a principled object specification, we present a syntax-directed 
analysis that infers optimum staleness bounds for each element of the state. 

We present the conditions required to simultaneously preserve the three prop- 
erties: convergence, integrity and recency. These conditions are used to define 
a novel operational semantics of replicated objects that provably preserve con- 
vergence, integrity and the inferred staleness bound. We observe that recency- 
awareness not only guarantees a limit on the staleness, but also allows buffering 
of calls and reduces the coordination required to preserve integrity. 

We characterize the computational power of recency-aware replicated objects. 
We show that recency-aware objects have the same power as the perfect failure 
detector. We present a novel protocol for recency-aware replicated objects that 
implements the semantics. We use off-the-shelve SMT solvers both statically 
and embed them at runtime to decide the validity of coordination-avoidance 
conditions. We present a tool called HAMPA that given an object definition, 
analyzes the object and instantiates the protocol to synthesize replicated objects. 
Our experiments with the synthesized objects show that the staleness bound has 
an inverse relationship with the coordination and response time. 

In summary, this paper presents the following contributions: (1) A relational 
object specification language that captures integrity and recency declarations, 
and its denotational semantics (Sect. 2). (2) The coordination conditions and the 
operational semantics of replicated systems that simultaneously preserve conver- 
gence, integrity and recency (Sects. 3 and 4). (3) A syntax-directed analysis that 
infers optimum staleness bounds for each element of the state (Sect. 5). (4) The 
characterization of the computational power and a protocol for recency-aware 
replicated objects, (Sect. 6). (5) The HAMPA replicated object synthesis tool and 
its experimental results (Sect. 7). All the proofs are available in the appendix [5]. 


2 Recency-Aware Relational Object Language 


Language. Figure 1 shows our core relational language for object specification. 
An object is a record (X,Z,M) that includes a state type X, an invariant 7 
on the state, and a set of methods M. The state can be a tuple of natural 
number Nat and relation Rel types. The invariant Z is a boolean function on 
the state. A method m is a function from the parameter x and the pre-state 
(£1,..,2n) to a record of (eg, €u, er). The guard eg is a boolean expression that 
captures the semantic preconditions of m such as conditions on the arguments. 
The expressions e, and ep are for the post-state and the return value. We use 
guard, update and retv as functions that extract elements of this record. For each 
method, the user declares an integer as the staleness bound e for its return value. 
A method call c is a method applied to its argument i.e. it is a function from 
the current state to a record of (eg, €u, €r). 

An expression e is either a value v (that can be either a number n or a relation 
R), a variable denoted by x, an application of the operators {+,—,=,<,&,!} 
to operand expressions where & is the conjunction and and ! is the negation 
operator, a selection 0) z).-(e’) that binds the attributes of each element of the 
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relation e’ to the variables % and returns the elements that satisfy the condition 
e, a projection Mym. (e) that for each element of the relation e’, binds its 
attributes to the variables 7 and calculates a tuple of elements (€) and returns 
the set of resulting tuples, a union e Ue’ that results in a relation with elements 
of both of the relations e and e’, a difference e \ e’ that results in a relation 
with the elements in the relation e that are not in the relation e’, and the 
Cartesian product e x e’ that results in a relation with pair elements where 
the first and second elements are in the relations e and e’ respectively. The 
language supports a complete set of relational operators: any relational algebra 
expression can be expressed by a combination of them. Selection (ø), projection 
(q), union (U), difference (\), product (x) and renaming (p) are a complete set of 
operators. We note that since the language uses functions with argument names, 
a renaming operator is unnecessary. The update and join operations are defined 
as a syntactic sugar. The update operation U, (T). le (T) e” returns a relation that 
updates each element of e” that satisfies the condition e to the tuple (e’). The 
join e1 &)(z7,z5).e €2 results in pairs of elements of e; and ez that satisfy the 
condition e. 


o:= (XT, M} Object Us pe myer = Update 
i= (T, ..,T) State fe ee e! U 
T := Nat | Rel ADe A Ee 
T Invariant ars eae 
M := me Methods €1 DIA (Tr,zz). e €2 = Join 
me := def € m(x)((x£1,..,£n)}) Method Or (wz,).¢ (€1 X e2) 
(eg, eu, €r) 7 E 
e:= Expression [v] =v [s] =- 
v Value 
x Variable [ese] =lele [e] [te] =! [e] 
e+e | e-e Math [ozele] = 
e=e | exe {t|tele] A [ez t]] = true} 
e&e | !e p 
7 r(z).e(€) Selection [nee (e’)] E 
IT z).(2) (e’) Projection {ek= tl itele} 
eUe Union Jeue =ļeļu [e] 
e\e Difference [ \ =I NI al 
exe Product s _ =Le z 
v:=n | true | false | R Value [exe] =[e]x [e] 


Fig. 1. Syntax and semantics of the specification language 


Semantics. Figure1 presents a denotational semantics for expressions. The 
semantics for values, variables, and binary and unary operations is standard. 
The semantics of the selection expression 0 z).¢/(e) is the set of tuples t in 
the semantics of e such that substitution of the attributes 7 in e’ with their 
corresponding values in t evaluates to true. The semantics of the projection 
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Class MovieBooking 


X := let rs := Set N x N in p Reservation: user identifier and movie identifier 
let ms := Set N x N in > Movie: movie identifier and available space 
(rs, ms) 


T := X(rs,ms). unique (ms, A(m, a). m) A 
refIntegrity (rs, A(u,m). m, ms, Alm, a). m) A 
rowlntegrity (ms, Alm, a). a > 0) 
book((u,m)) := 0 A(rs,ms). 
(em grs (rsU(u,m), Unon ak (m’=m,(m,a—1)) MS), L) 
cancelBook((u, n := 0 A(rs, ms}. 
(True, (r rs \ (u, m), U y¢m',a). (m!=m,(m,a+1)) ms), L) 
offScreen(m) := 0 Aes ms). 
(True, (rs, MS \ 0 X(m!,a).m’=m ms), A) 
specialReserve((m,7)) := 0 A(rs,ms). 


(n > 0, (rs. Uy, (m’,a). (m’/=m,(m,a—n)) ms), 1 
increaseSpace( (m, a = 0 A(rs, ms). 

(n > 0, (rs, U (m’,a). (m’=m,(m,a+n)) ms), 1) 
querySpace(m) := Oe. ms). 


(True, a. Iyim’ a). (a) (2(m’',a). m!=m ms)) 
queryReservations(u) := €2 A(rs,ms). 

(True, (rs,ms), TT} (um). (m) (Od (ulm). u/ =u rs)) 
querySpaces(w) := €3 A (rs, ms). 

(True, (rs,ms), IT) (u,m,m’,a) (m,a) (rs DIA (u,m),(m!,a). m=m! ms)) 


Fig. 2. Movie booking use-case 


expression IT) zy .(2)(e’) is a set of tuples, one per each tuple ¢ in the seman- 
tics of e’: a tuple resulted from substituting z with t in the expressions € and 
evaluating them. The semantics of union, difference and product are standard 
from the set theory. We define the difference A between two values as follows: 
the difference between two natural numbers is the absolute value of their sub- 
traction i.e. A(n,n’) = |n — n’|; the difference of two relations is the size of 
their symmetric difference i.e. A(R, R’) = |R\ R'| + |R \ R|. We use delta ô to 
represents the staleness of a value that is the difference between the value and 
its target value. The delta for a completely recent (or exact) value is zero. For 
a call c, the weight weight(c) is a bound on the difference that the execution of 
c can make on the state of the object. In other words, for every call c, we have 
Vo. Let (-,0’,-) := c(a) in A(o',o) < weight(c). 


Running Use-Case. Figure 2 shows the movie booking use-case. The state of 
the object is the two relations reservation rs and movie ms. The reservation 
relation rs stores the movies that the users have booked; it is the pairs of users 
u and movies m. The movie relation ms stores the number of available spaces for 
each movie; it is the pairs of movies m and spaces a. The integrity property Z is 
a conjunction of three conditions: (1) The movie in ms should be unique. (2) The 
referential integrity requires that every movie in rs exists in ms. (3) The number 
of available spaces for every movie should be non-negative. The object provides 
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five update methods and three query methods. Given a user u and a movie m, the 
method book adds the pair to rs and decrements the available spaces for m in ms. 
Similarly, the method cancelBook removes a reservation and increments available 
spaces. Given a movie m, the method offScreen removes the corresponding tuple 
from ms. Given a movie m and a number n, the method specialReserve subtracts 
n from the available spaces for m in ms. The dual method increaseSpace adds n 
to the spaces for m. Given a movie m, the method querySpace returns the number 
of available spaces for m. The method queryReservations returns the set of movies 
that the given user has booked. Given a user u, the method querySpaces returns 
the pairs of movies and their available spaces for the movies that u has booked. 
The staleness bound for the update methods is specified as 0. The returned none 
constant L is always exact. The bound values €1, €2 and e3 of the query methods 
represent the number of tuples that are different between the current state and 
the pending stable state of the result relation. 


agi i call(i(m,n’)) call(i(m,n" )) 
l l 


rep2 


i(m,n) i(m,n’) 
call(i(m,n)) call(s(m,n)) call(i(m’,n’)) 
i(m,n) s(m,n) i(m’,n’) b(u,m) 
repy i | | $ 
(b) 

call(b(r,m)) unblock 

rep jut l Itl It] i $ 
block{b,c,o,s} s(m,n) b(u,m) i(m’,n’) i(m,n) 
is i ie lian i 
| 
i(m,n) s(m,n) i(m',n') 


() a 
call(b(u,m)) 


rep2 


Cae Ite N 


i(m,n) s(m,n) i(m',n’) 


Fig. 3. (a) Buffering and coordination. Example execution (b) without and (c) with 
recency. |: request, 1: indication, ~~: synchronization 


To reduce communication, certain calls can be executed locally and buffered, 
and the buffer can be communicated to other replicas later. As an example, 
in Fig. 3(a), the first two calls to the method increaseSpace do not exceed the 
staleness bound for ms and can be buffered. However, the third call exceeds the 
bound and cannot be added to the buffer. Therefore, the buffer is flushed to 
other replicas and the third call is blocked until an acknowledgement for the 
delivery of the buffer is received. All the calls of the buffer can be sent in a single 
message and the acknowledgement for them can be sent in a single message as 
well. 
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Let us now consider the interaction of buffering with coordination. We will 
see that buffering (staleness) interestingly reduces the coordination required for 
the conflicts. (We will define conflicting calls that should be synchronized later 
in Sect. 3.) Fig.3(b) and (c) show the same execution without and with buffer- 
ing respectively. In Fig. 3(b), the first replica rep, executes the sequence of calls 
increaseSpace, specialReserve and increaseSpace. The method increaseSpace does 
not conflict with any other method; therefore, calls to it are simply broadcast. 
The method specialReserve conflicts with itself and the method book; there- 
fore, the call to it goes through synchronization. The second replica rep, calls 
book that conflicts with four other methods. Hence, it should synchronize. (The 
synchronization reaches to other replicas, blocks calling the four methods, and 
propagates previous calls to those methods.) In this example, the conflicting 
specialReserve call in rep, should be propagated to rep, before the book call can 
be executed. 

In Fig. 3(c), the recency bound allows the three calls of rep, to be buffered. 
Replicas use SMT solvers at runtime to check the validity of three prop- 
erties for the buffers: all-S-commutativity, invariant-sufficiency and let-P-R- 
commutativity that we will formally define in Sect. 3. In this example, the buffer 
is invariant-sufficient if the number of spaces that the call specialReserve decre- 
ments is less than the number that the increaseSpace calls increment. Therefore, 
the buffer can be sent to other replicas without any additional synchroniza- 
tion; the invariant in the pre-state is sufficient for the invariant in its post-state. 
We note that the call specialReserve that previously went through synchroniza- 
tion does not need any synchronization inside the buffer. Further, the let-P-R- 
commutativity property of the buffer guarantees that the book call will preserve 
the integrity after the buffer. Thus, the synchronization of the book call that 
previously waited for the specialReserve call does not need to wait anymore. 


3 Coordination Conditions 


In this section, we present the coordination conditions for replicated objects 
that preserve the three properties: convergence, integrity and recency. The state 
of the given sequential object is replicated across replicas. Clients can request 
method calls at every replica, and replicas coordinate the calls. Convergence is 
the safety property that when all pending updates are processed, the replicas 
converge to the same state. Integrity is the safety property that every method 
call is executed only on a state where the guard of the method and the invariant 
are satisfied. Recency is the safety property that bounds the difference between 
the state of a replica and its impending state after the pending calls are applied. 

The state of each replica is initialized to the same state oo that satisfies the 
invariant Z. The replica that accepts the request for a call from the user is called 
the originating replica of the call. We uniquely identify requests by identifiers r. 
We use the two maps call and orig that map request identifiers to the method 
call and originating replica respectively. The execution history of a replica is 
modeled as a permutation of a set of request identifiers. An execution x of a 
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set of requests R is a bijective from positions [0..|R| — 1] to R. We denote the 
range of x as R(x). An execution x of R defines the total order <x on R: A 
request r precedes another request r’ in an execution x written as r <x r’ iff 
x-l(r) < x l(r’). A replicated execution xs is a function from replicas M to 
executions. The post-state of each call at a replica is the result of applying the 
call to its pre-state. 

We first revisit the coordination conditions for convergence and integrity [30], 
and then present coordination conditions for recency and their impact on the 
prior conditions. 


Convergence. A replicated execution is convergent if the state of the replicas is 
the same after all the calls are propagated. Out of order delivery of method calls 
at different replicas can lead to divergence of their states. Method calls such as 
special reservation specialReserve and increasing space increaseSpace result in the 
same state if their order of execution is swapped. However, the resulting state 
of the two method calls book and cancelBook is dependent on their execution 
order. Therefore, they should synchronize. 


Definition 1 (State-Commutativity and State-Conflict). Two method 
calls cı and co S-commute, written as cy Ss co iff for every state o, 
update(c2)(update(ci)(o)) = update(c,)(update(ce)(o)). Otherwise, they S- 
conflict, written as cı DAs C2. 


Integrity. The body of each method relies on the invariant in the pre-state. 
Further, methods have explicit guards that declare their pre-conditions. We say 
that a method call enjoys integrity at a state if the invariant and the guard of 
the method hold in that state. 


Definition 2 (Integrity). A method call c enjoys integrity in a state o, written 
as integrity(c,c), iff guard(c)(o) and T(c). 


Method calls should be executed only in states that they have integrity in. The 
integrity condition is simply lifted to executions and replicated executions: An 
execution enjoys integrity iff every request in it enjoys integrity. 


Definition 3 (Permissibility). A method call c is permissible in a state o, 
written as P(a,c), iff guard(c)(c) and T(update(c)(c)). 


In contrast to integrity that requires the invariant to hold in the pre-state, 
permissibility requires it to hold in the post-state. The post-state of a call is the 
pre-state of the next call in a replica. Further, the initial state is assumed to 
satisfy the invariant. Therefore, if every call is permissible in its pre-state, then 
every call enjoys integrity. By induction, permissibility leads to integrity. 

To execute a method call, we check that it is permissible at its originating 
replica. Thus, we say that each method call is locally permissible. Otherwise, the 
call is aborted or delayed. Still, if the call is simply broadcast, it is not necessarily 
permissible when it arrives at other replicas. Some calls need coordination. 
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Conflict. There are calls such as increaseSpace that are always permissible as 
far as they are applied to a state that satisfies the invariant. Increasing the space 
cannot result in a missing or duplicate movie or a negative number for available 
spaces. Thus, if it is broadcast and executed on another replica, it is sufficient 
that the pre-state satisfies the invariant to preserve it in the post-state. 


Definition 4 (Invariant-Sufficient). A call c is invariant-sufficient iff for 
every state o, if I(o) then P(a,c). 


However, not all calls are invariant-sufficient. For example, a book call may 
be permissible in a replica but may become impermissible in another when it 
is executed after an already executed offScreen call for the same movie. These 
two calls should synchronize to preserve integrity. Nonetheless, some pairs of 
calls such as offScreen and specialReserve do not affect each other’s permissibil- 
ity. (In the running example, specialReserve has no guards. After an offScreen 
call, it remains permissible as it doesn’t find the movie and leaves the relation 
unchanged). 


Definition 5 (Permissible-Right-Commutativity). The call cı P-R- 
commutes with the call co written as cı >p c2 iff for every state o, if P(o,c1) 
then P(update(c2)(o), c1). 


If a call cı is invariant-sufficient or P-R-commutes another call co, then the 
call cı will stay permissible when it is propagated and applied to another replica 
even if c2 is executed before it in that replica. 


Definition 6 (Permissible-Concur and Permissible-Conflict). A call cı 
P-concurs with a call cg iff cı is invariant-sufficient or cy >p c2. Otherwise, c1 
P-conflicts with c2. 


The call offScreen P-concurs with the call specialReserve; however, the call 
book P-conflicts with the call offScreen. 

We say that two calls concur iff they both S-commute and P-concur with 
each other. Otherwise, we say they conflict and need synchronization. 


Definition 7 (Concur and Conflict). A pair of calls cı and co concur iff 
they S-commute and P-concur with each other. Otherwise, they conflict cy D| c2. 


Dependency. As we saw above, invariant-sufficient method calls can always 
preserve the invariant. However, there are calls whose preservation of the invari- 
ant is dependent on the calls that have executed before them at that replica. For 
example, taking the movie off-screen offScreen is dependent on cancelling the last 
booking cancelBook. If offScreen is moved left before cancelBook, it can become 
impermissible. Nonetheless, taking a movie off-screen offScreen is independent 
of the previous special reservations specialReserve. 


Definition 8 (Permissible-Left-Commutative). A call c2 P-L-commutes a 
call cı, written as c2 —p c1 iff for every o, if P(update(ci)(c), c2) then P(o, c2). 
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A call can avoid tracking dependencies to another call if the former is 
invariant-sufficient or P-L-commutes with the latter. 


Definition 9 (Independent and Dependent). A call cp is independent of 
Cı, written as co IL c1, iff either co is invariant-sufficient or co —p c1. Other- 
wise, Co is dependent on cı, written as co VM c1. 


If cı is executed before co in the originating replica of cg and cy is dependent 
on cı, then cg should be applied to other replicas only if cı is already applied. 


Recency. Calls executed at a replica may be delayed in the network before they 
are executed in other replicas. Further, they may be buffered at the originating 
replica to reduce communication. The pending calls for a replica are the calls 
that have executed in other replicas but not at that replica yet. The staleness of a 
replica is the difference of its current state and its state after applying its pending 
calls. Given a bound e, a replica is sufficiently recent if its staleness is less than e. 
The calls that have originated in the current replica n but have not been received 
yet by another replica n’ make the state of n’ stale. To bound the staleness of 
n’ by €, the staleness imposed to n’ by the calls originated by each of the other 
|V] — 1 replicas should be bounded by €/(|M/| — 1). The difference that these 
calls can make is bounded by the sum of their weights (defined in Sect. 2). The 
staleness bound can be evenly divided between the replicas. However, in general 
it can be distributed unevenly and even dynamically. In particular, replicas that 
tend to issue updates more often can get a larger share. 

Given a recency bound, a buffering quota can be calculated for each replica 
and the recency bound can be preserved when calls are buffered. Buffering calls 
can reduce communication; however, it can affect the convergence and integrity 
properties. To preserve these properties a buffer should have three properties: all- 
state-commutativity, invariant-sufficiency and let-P-R-commutativity. We con- 
sider each condition in turn. 


Definition 10 (All-State-Commutative). A call is all-S-commutative if it 
is S-commutative with respect to every call. 


The calls of the buffer are executed locally and are not synchronized with 
other replicas. Therefore, if the buffer is not all-S-commutative, concurrent exe- 
cution of S-conflicting calls in other replicas can lead to divergence. Similarly, if 
the buffer is not invariant-sufficient, concurrent execution of P-conflicting calls 
in other replicas can lead to impermissibility of the buffer when it is propagated 
and executed in other replicas. The buffer in Fig.3(c) is all-S-commutative: 
it includes increaseSpace and specialReserve calls that result in increasing or 
decreasing the space for movies; the result is S-commutative with respect to 
all method calls. Further, it is invariant-sufficient if the net result of its calls 
is a non-negative addition to the space of each movie. For example, if the 
increaseSpace calls add s spaces and the specialReserve calls subtract s’ spaces 
from the same movie where s’ < s, then the net effect is adding spaces and the 
buffer is invariant-sufficient. 
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Definition 11 (Let-Permissible-Right-Commutative). A call is let-P-R- 
commutative if every call P-R-commutes with it. 


Calls in other replicas are checked to be permissible with no knowledge of 
the buffered calls in the current replica. Let-P-R-commutativity of the buffer of 
the current replica guarantees that the calls in other replicas will continue to be 
permissible once they are propagated and executed after the buffer in the current 
replica. The buffer in Fig. 3(c) is let-P-R-commutative; it may only increase the 
number of spaces that cannot make any call impermissible. 


4 Replicated System Semantics 


In this section, we define the operational semantics of replicated objects where 
(1) the integrity property Z on the state of each replica is always preserved, (2) 
replicas converge to the same state once all the calls are propagated, and (3) the 
staleness of each replica is always bounded by e. The semantics declares the con- 
ditions for execution and propagation of method calls on the replicated object to 
guarantee the three properties. In particular, it represents the conditions for local 
buffering of method calls to avoid communication while preserving the recency 
of the other replicas. In Sect. 5, we will see a static analysis that infers staleness 
bounds for the state. In this section, the semantics preserves the inferred stale- 
ness bound e for the state o of the object. (For objects with multiple pieces of 
state, the staleness of each piece can be tracked separately.) The semantics strives 


to concisely define the conditions; we will present the protocols that implement 
these conditions in Sect. 6. 

As Fig.4 shows, the global state w := (h,t,xs, orig, call) World 
of the replicated system is represented ho: NeSxYxR Hosts 
as a world w that is a tuple of n N Replica nodes 
(h,t,xs, orig, call). The hosts h is a 8:9 := xec s | skip Statement 
mapping from replica identifiers M to ©: C == m(e) | id Call 
the local state of replicas. Each call 4 M Method 
; i : ; : e := xlv Expression 
is assigned a unique request identi- i Variable 
fier r at the originating replica. The v Value 
two maps call and orig keep a mapping n 5 Object State 
from request identifiers to the call and n R Request 
the originating replica of the request t : Set P Transit 
respectively. The state of each replica p:P := (n,r) | (n,r*) Packet 
is a statement s € S, the state of the xs : N List R History 
object o € X, and the identifier re R orig : R= M Original node 
of the current buffer. A statement s is call : Re C Request call 
either x | c; s’ that is the sequence of Wo *= Init World 
a call c and another statement s’, or (n= (5n; 00; Tn) new: 0,0, 
the terminal statement skip. A call c is [Fn nen, [rn id]new) 


the application of a method m to an 
argument expression e. A call can also Fig. 4. Operational semantics state 
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be the identity call id that leaves the state unchanged. (It is assumed that client 
statements do not make id calls.) The network ¢ is the set of packets that are 
sent but not yet delivered. A packet p contains the identifier of the destination 
replica n and the request identifier r of the call. If a packet is transmitting a 
buffered call, it is decorated with an asterisk x. The history xs is a mapping from 
replica identifiers M to the list of request identifiers of the calls that are previ- 
ously applied to that replica. The initial value of the world state is wọ where 
each replica n hold its initial statement sn, the initial state a9 of the object 
that satisfies the integrity property Z, and an empty buffer. Empty buffers are 
represented by mapping the buffer identifier r» of each replica n to the identity 
call id. 

Figure5 presents the operational semantics. The rule CALL executes a 
method call c at a replica n. The call c can be executed if the following conditions 
hold. (1) To preserve integrity, the call c should be locally permissible P(o, c) in 
the current state o. (2) To preserve convergence and integrity, any pair of con- 
flicting calls should have the same order across the replicas, a property that we 
call conflict-synchronization. Thus, to execute a new request r, the rule CALL 
requires the condition ConflictSynclnit: any call r’ that is already executed in 
another replica n’ and conflicts with the current call r should have been already 
executed in the current replica n. Otherwise, once the calls r and r’ are propa- 
gated and executed on the other replicas, they will have different orders in the 
two replicas n and n’. (3) To preserve recency, this rule requires the condition 
InBound: the difference that the pending calls from the current replica n can 
make to the state of every other replica n’ should be bounded by €/(|M| — 1). If 
the conditions above hold, a fresh identifier r is created for the call, the history 
xs and the maps orig and call are updated to reflect the new call, a packet is sent 
in the network t to every other replica, and the variable x is substituted with 
the returned value v of the call in the continuation statement s of the current 
replica. 

The rule DELIVER delivers a call that has been sent to the current replica. It 
requires two conditions: conflict-synchronization and dependency-preservation. 
(1) Similar to the rule CALL, conflict-synchronization requires ConflictSync: if 
a conflicting call r’ is executed before the received call r in another replica n’, 
then r’ should have been already executed before r in n as well. (2) To preserve 
integrity, the dependencies of calls should be preserved. Thus, the dependency- 
preservation condition DepPres requires that a call r originated from a replica n’ 
is executed in the current replica n only if the calls r’ that have been executed 
before r in n’ and r is dependent on r’ should have been already executed in n. 

Recency-aware replication can be applied to any object, but it can improve 
performance when there are method calls that can be buffered. The rule CAL- 
LLOCAL executes a call but locally buffers it. Similar to the rule CALL, it 
first checks the local permissibility of the call c. Since a buffered call is not 
immediately coordinated with calls in other replicas, it should satisfy the three 
properties (that saw in Sect.5) to make it concur with any call: (1) all-state- 
commutativity AllSComm, (2) invariant-sufficiency InvSuff, and (3) let-P-Right- 
commutativity LetPRComm. The identifier of the current buffer is r; the current 
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CALL 
P(o, c) c(a) = (_,0’, v) 
fresh r orig’ = orig[r + n] 
call’ = call[r + c] DELIVER , 
xs’ = xsin = (xs(n) :: r)] allr) (o) =(;0,-) 
ConflictSynclnit can) (xs’, n, r) xS =xs|n = (xs(n) 7) 
InBound orig’ cal’) (xs’, n) ConflictSync aii (xs n, r) 
t)=tu{(n',r) |n EN \ {n}} DepPres orig can (XS n, r) 
(hin Hk (a +— C8, 0, r^)], t, Xs, orig, call) (h[n as (s, T, r’), tU {(n, rì}, XS, orig, call} ) 
(hin = (s[a = v], 0’, r’)], t, xs’, orig’, call’) (hin = (s,0’,r’)], t, xs’, orig, call) 
SENDBUFFER 
CALLLOCAL call(r) A id fresh r’ 
P(a,c) clo) = (.,0',v) orig’ = orig[r’ + n] call’ = call[r’ + id] 
cd = c- call(r) t =tU{(n',r*) |n EN \ {n}} 
AllSComm(c) (h[n = (s,0,7)], t, xs, orig, call) 
InvSuff(c’) LetPRComm(c’) Siy 
call’ = call[r + c'] (hin + (s,0,7’)], t’, xs, orig’, call’) 
oe xs[n +> (xs(n) ::: r)] if call(r) = id 
xs else DELIVERBUFFER 
InBound (orig call’) (xs’, n) call(r)(a) = l,o, 


xs’ = xs[n +> (xs(n) ::: r)] 


(hin = (s,0,r’)],tU {(n,r*)}, xs, orig, call}) 


n,r,call(r) 
— 


(hin + (a — cs, ø, r)], t, xs, orig, call) 
wees 
(hin = (s[a = v], 0’, r)], t, xs’, orig, call’) 
(hin = (s,0’,r’)], t, xs’, orig, call) 


id := Xo. (True, o, L) 
P(o,c) := Let (g,0’,-) :=c(o) in (g = true A Z(o’) = true) 
ConflictSynclnit ai) (xS; n, r) == Wn',r’. r” € xs(n’) A call(r) p< call(r’) > r’ € xs(n) 
ConflictSync,.a) (x5, n, r) = Vn’, r". T <yn) TA eall(r) bd call(r’) = r <xs(n) T 
DepPres orig cai XS, n, r) = Yr’. 1! Xyscorig(r)) T A call(r) 4 call(r’) > r” € xs(n) 
AllSComm(c) := Ye. c Ss c 
InvSuff(c) := Vo. T(o) —> P(o, c) 
LetPRComm(c) := Ye. d >p c 
InBound (orig,catt) (%5,0) = Yn’. So, e xs(n) sln’ )norig(ryj=n Weight(call(r)) < WE 
(c-c)(o) := Let (,0’,-) := c' (o) in c(o') 


Fig. 5. Replicated system semantics 


call c is composed with the current buffered call call(r) to result in a composed 
call c' for the updated buffer. The composition - of calls simply cascades their 
updates to the state. The all-state-commutativity condition is stated for single 
calls c (that implies the same condition for the composed call c’ as well). This 
condition is required for the call c because there might be other calls delivered 
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between the last buffered call and the currently buffered call c. The call c should 
state-commute past the calls in between. Further, as explained for the rule CALL, 
the condition InBound requires that the added staleness remains within bound. 
If the above conditions hold, the map call is updated with the new buffer call c’, 
and the identifier r of the buffered call is added to the history xs, if the buffer 
was empty and the current call c is the first buffered call. 

The rule SENDBUFFER sends the buffer to every other replica and resets 
the buffer. Packets transmitting buffers are decorated with an asterisk. The 
rule DELIVERBUFFER receives a packet containing a buffer. As we saw in the 
rule CALLLOCAL, buffers are checked to be invariant-sufficient in the originat- 
ing replica. Therefore, on receiving a packet containing a buffer, in contrast to 
the rule DELIVER, the rule DELIVERBUFFER does not checks the dependency- 
preservationDepPres and the conflict-synchronization ConflictSync conditions. 

The following lemmas state the three properties of the semantics. The fol- 
lowing lemma states that once the buffers are flushed call(7) = call(r’) = id and 
the messages are delivered t = Ø, the replicas converge to the same state. 


Lemma 1 (Convergence). For all h, n, n', o, 0’, r andr’, if wo —* 
(h,O,-,-,-) where h(n) = (.,0,7), h(n’) = (.,0',7’) and call(r) = call(r’) = id 


then o =o’. 


The following lemma states that every call enjoys the integrity property. 
Lemma 2 (Integrity). For all h, n, r, c, w ando, if wo —* (h,-,-,-,-) = 
w where h(n) = (_,0,-) then integrity(c, c). 


The staleness of a replica is the difference of its current state and its state 
after applying its pending calls from others (buffered calls and in transit calls). 
The following lemma states that the stateless of every replica is bounded by e. 


Lemma 3 (Recency). For all h, h’, n, s, o and o’, if wo —* (h,-,-,-,-) 
(— U *5)* (W, hln) = (8, 0, -), and h'(n) =(s,0’, -) then A(o’,o) < e. 


5 Staleness Bound Inference and Optimization 


In Sect. 4, we presented an operational semantics that preserves a given staleness 
bound for the state. The users declare the recency that they expect from the return 
value of each method of the object. The specified bounds for the methods can be 
used to infer the bounds for the elements of the state. In this section, given an object 
specification that includes recency declarations for the methods, we present astatic 
analysis that infers optimum staleness bounds for each element of the state. We 
present a syntax-directed analysis that derives recency constraints between bound 
variables for the state elements. A solution to the constraints assigns a bound value 
to each state element such that if every state element keeps its staleness bound then 
the result of every method call respects the recency declaration of the method. The 
optimum solution maximizes the (weighted) sum of the bounds to increase buffered 
calls and hence decrease communication. 
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Fig. 6. Bound constraint derivation 


Figure 6 presents the constraint inference rules for the object language that 
we saw in Fig. 1. A delta bound 6 is either a natural number n, a delta variable 
dx, or addition or multiplication of two deltas. A constraint C is equality or 
comparison of two deltas, or conjunction of two constraints. A delta environment 
T is a mapping from variables to delta variables or values. The judgements are 
of the following forms: the judgement o > C states the bounding constraint C' for 
the object o, the judgement m > C states the constraint C for the method m, 
and the judgement T F e > 6,C states that under the delta environment T, the 
staleness of the expression e is bounded by 6 when the constraints C are satisfied. 
The rule COBJ states that the constraint for an object is the conjunction of the 
constraints for its methods. (We assume that the state variables passed to all the 
methods are renamed to the same variables (01, ..,).) The rule CMET infers 
the constraints for a method by first, inferring the constraints for its return 
expression under a delta environment where the argument is mapped to the 
delta value of zero (exactly recent) and the state variables o; are mapped to 
delta variables do; to be inferred, and second, bounding the return value. The 
rule CVAL assigns the delta value zero to values with no constraints. (Values 
are exact.) The rule CVAR retrieves the bindings for delta variables from the 
environment. The rule COP states that the delta for the result of the operators 
{+,—,U,\} is the sum of the delta of its operands. On the other hand, the rule 
CBOp requires the operands of the boolean operators {=,<,&} to be exact 
and states that the result is exact as well. We elide the similar rule for the 
unary negation operator !. The rule CSEL requires the selection condition to be 
exact and states that the delta of the resulting relation is the same as the input 
relation. In other words, the resulting relation is stale by the same number of 
elements as the input relation. Similarly, the rule CPROJ states that the delta of 
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the resulting relation is the same as the input relation. On the other hand, the 
rule CPROD states that the delta for the resulting relation is the multiplication 
of the deltas for the input relations. In our running example, let us associate the 
bound variables drs and dms to rs and ms respectively. The constraint inferred 
for querySpace is dms < €2, for queryReservations is drs < €,, and for querySpace 
that involves the join operator (product and selection) is drs x dms < €3. More 
detailed explanation for these derivation is available in the appendix [5]. 

We now define the notion of sufficiently-recent states. Intuitively, a state is 
sufficiently-recent with respect to the target state if the difference of the return 
value of every method call on that state versus the target state is within the 
declared bound of the method. 


Definition 12 (Sufficiently-recent State). A state (v1,..,Un) is a 
sufficiently-recent state with respect to the target state (vš, .. už) for an object 
o iff for every method def € m(x)((o1,..,0n)) (€g,€u, er) of o, and every argu- 


ment v, let vy be [ele > v|loi => vl | and ux be ee = vllo; UF] | , we have 
Alvp, ur) < €. 


The following lemma states that the bound inference presented in Fig. 6 is 
sound. In other words, if the inference derives the constraints C for an object, 
for any solution S of C, if the staleness of each state element c; of the object 
remains within the bound S(do;), then the state remains sufficiently-recent. 


Lemma 4 (Soundness of Bound Inference). Given an object o with the 
state variables (01,..,0n), if o > C that is the constraints C (over the bound 
variables do;) are derived for o, and S is a solution for C, then for every pair 
of states o = (V1,..,Un) and o* = (vï, .. v>), if A(vi, v7) < S(do;) then o is 
sufficiently-recent for o*. 


There may be many solutions for the derived constraints, and hence, many 
sound state bounds that preserve the user-specified bounds for the object. How- 
ever, solutions that allow more staleness (albeit appropriately bounded) are more 
favorable since they allow more buffered calls and require less communication. 
Thus, a candidate objective function to maximize is do, + .. + doy. In other 
words, what are the largest delta bounds for the state elements that still pre- 
serve the recency specifications of the methods? This function gives the same 
weight to all the state elements; however, some may be updated more frequently. 
Let fi be the relative update frequency of the state element o;. Frequencies can 
be obtained from historical logs or profiling. The objective function is defined 
as the following weighted sum do1/fi + ..+ don/fn. More frequently updated 
state elements are given proportionally larger bounds. In our running example, 
let €1 = 3, €2 = 4, and e3 = 6. If the update frequency of rs is twice as ms, the 
optimum solution is drs = 3 and dms = 2. 


Definition 13 (Recency Bound Optimization). Give an object o and the 
relative update frequency fi of the state elements ci of o, if o> C then the 
optimum staleness bounds for o are the solution S of C that maximizes do1/ fı + 
tdon/fn- 
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It is obvious that the objective function can be easily translated to a linear 
function by multiplying the least common denominator of the frequencies. 


6 The Power and the Protocol of Recency-Aware Objects 


Now, we show that recency-aware objects are stronger than the perfect failure 
detector abstraction [18] and present a protocol that implements recency-aware 
objects using perfect failure detectors. These two results show that recency-aware 
objects have the same computational power as the perfect failure detector. 

The perfect failure detector abstraction P notifies processes about the crash 
of the other processes in a synchronous network. It has the following properties: 
Liveness: Every crashed process is eventually detected by all correct processes. 
Safety: No correct process is ever suspected by other processes. The recency- 
aware object R has the following liveness and safety properties. Liveness: If 
the user makes a request to a correct replica, it eventually responds. Safety: 
Executed calls that are yet pending for each correct replica is bounded. The 
following lemma states that P is reducible to R and also its opposite, R is 
reducible to P. 


RecencyAwareObject 

request : call(C) 

indication: ret(C, V) | aborted(C) 

Params: 
e: Int 
SConf: Set[M] 

Using: 
rb: ReliableBroadcast 
pl: PerfectPointToPointLink 
pfd: Perfect Failure Detector 
bro: BasicRepObject 

State: 
o: X = o0; buff = 0; wq = 9; 
up =N; p: N & Set[(C] = N — 0 


foreach (r € up \ {self}) 
p'(r) — (p(r) U {c}) 
if (InBound(p’)) 
p-p 
issue request( bro, call(c)) 
else 
issue request (rb, broadcast(buff( buff ))) 
insert(wq, c) 
indication crash(pfd, p) 
up — up \ {p} 
fun InBound(p) 
foreach(n € up) 
if (Xer ep(ny Weight(c’) > e/(N — 1)) 
return False 


return True 


request (call(c)) if (method(c) ¢ blocked(bro)) 


if (=(P (ø, c)) 
issue indication aborted(c) 
else 
if (method(c) ¢ SConf A 
InvSuff( buff) A LetPRComm(buff)) 
foreach (r € up \ {self}) 
p(n) — ((p(r) \ {buff} U {e- buff) 
if (InBound(p’)) 
p= p 
exec(c); buff — c- buff 
else 
issue request (rb, broadcast (buff( buff ))) 
insert(wq, c) 
else 


indication (rb, deliver(n, buff( buff ))) 
if (self A n) 
exec( buff ) 
issue request (pl, send(n, ack( buff ))) 
indication (pl, deliver(m, ack(c))) 
p = pin (p(n)\ {e})] 
foreach (c € wq) issue request (call(c)) 
wq — 0 
fun exec(c) 
o +— update(c) (o); v — retv(c)(c) 
issue indication ret(c, v) 
indication (bro, ret(c, v)) 
issue request (pl, send(orig(c), ack(c))) 
issue indication ret(c, v) 


Fig. 7. Recency-aware protocol 
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Lemma 5. PSR A RZP. 


For the proof of the first conjunct, consider two replicas rep, and rep). We show 
by contradiction that rep, will eventually know whether rep has crashed. We 
assume the opposite. Consider an execution where rep, has already executed a 
set of requests R and receives another request r from the user, such that the 
pending set RU {r} makes a difference in the state of rep, that pushes it out- 
of-bound. By the contradiction assumption, rep, is never informed when reps 
crashes. Therefore, if rep, does not hear from reps, the following two scenarios 
are indistinguishable to rep,. (S1) The replica rep, has crashed. (S2) The replica 
rep, is too slow. The replica rep, has the following two choices: (C1) The replica 
rep, waits to hear from rep about receiving a request in R before processing 
and responding to r. (C2) The replica rep, processes and responds to r. If the 
protocol makes the choice C4, it might be the scenario $; and then the liveness 
property is violated. If the protocol makes the choice C2, it might be the scenario 
S and then the recency bound for rep, is violated. 

The second conjunct, directly follows from the protocol. We briefly describe 
the protocol in Fig. 7 that implements a recency-aware replicated object. The 
full description of the protocol is available in the appendix [5]. Given an object 
definition, the protocol benefits from both static and dynamic coordination anal- 
ysis to guarantee convergence, integrity and recency. To reduce communication, 
replicas try to execute the calls locally while maintaining the staleness bound 
e. Each replica keeps its locally executed calls in a buffer buff before they are 
broadcast. Replicas send an acknowledgement ack to the originating replica once 
they receive and execute a call or a buffer of calls. Each replica rep keeps a map 
called pending p from each replica rep’ to the set of pending calls sent from 
rep to rep’. When a replica originates a call c, it adds c to its local pending set 
for each of the other replicas; once it receives an acknowledgement for c from 
a replica rep’, it removes c from the set of pending calls for rep’. Each replica 
keeps the set of correct replicas up, and removes a replica from the set if the 
prefect failure detector pfd issues a crash event for that replica. A requested call 
can be executed only if it does not push the pending set for any correct replica 
out of the bound. Otherwise, it cannot be immediately executed and is kept in a 
waiting queue wq to be retried later, and further, the buffer is sent to the other 
replicas and is reset to accelerate the shrinking of the pending set. To decide 
whether a call can be executed locally, the conditions of the rule CALLLOCAL 
of the operational semantics (Sect. 4) are checked. The set of state-conflicting 
methods SConf that is statically calculated is consulted to check if the call is all- 
state-commutative. The validity of the two conditions invariant-sufficiency and 
let-P-R-commutativity of the buffer (after the new call is added) are dynami- 
cally decided by a solver at run-time. If the conditions do not hold, the call is 
coordinated with other replicas using the basic blocking coordination protocol 
bro [30] that guarantees integrity and convergence but not recency. 
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7 Experimental Results 


We have implemented the analysis and protocol as a synthesis tool called HAMPA. 
We applied it to two use-cases: the bank account use-case (with the withdraw, 
deposit and balance methods and the integrity property of non-negative bal- 
ance) and the movie booking use-case (Fig.2). The experiments show that as 
the staleness bound increases, the coordination overhead and response time of 
recency-aware objects is decreased. Further, recency-aware objects are twice as 
responsive as sequentially consistent counterparts. 


Platform and Setup. The experiments are conducted on a cluster of 4 com- 
puting nodes. Each node has 2 AMD Opteron 6272 CPUs with a total 8 cores, 
64GB ECC memory and 40Gbps InfiniBand network. JDK is openjdk version 
1.8.0_222. We used the CVC4 [11] SMT solver v.1.7. Reported numbers are the 
arithmetic means of results from three repetitions on 4 replicas. In the experi- 
ments for the bank account use-case, all the calls are applied to the same account 
object and the amount is selected randomly in the range [10,20]. For the movie 
use-case, we send requests for each movie identifier to the same replica. Further, 
we do not issue offScreen calls because taking a movie off-screen causes later 
method calls on the same movie to be aborted and thus, these methods are not 
fully exercised. This would significantly improve the response time. However, in 
practice, offScreen calls are rarely used. The movie and user IDs are chosen at 
random from six and a hundred unique IDs. In all the experiments, we execute 
500 calls in millisecond intervals evenly distributed between 4 replicas. 
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Fig. 8. Effect of recency on coordination load and response time. (a) and (b) show the 
bank account use-case. d, w, and b stand for deposit, withdraw and balance (with the 
frequencies of 75%, 25%, 5% in the workload respectively). (c) and (d) show the movie 
use-case. c, b, q, s, and i stand for cancelBook, book, querySpace, specialReserve, and 
increaseSpace (with frequencies 4%, 6%, 5%, 40% and 45%). 
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Measurements. We measure two comparison criteria: coordination load and 
response time. At the lower layers, the protocol reduces to three communication 
primitives: total-order-broadcast (TOB), reliable-broadcast (RB) and point-to- 
point links (P2P). To measure the coordination overhead, we separately count 
the number of different types of messages that replicas send during the execution 
of their requests. The response time for a call is the duration between the time 
that the client requests the call and the time that the user receives the return 
value. 

We performed three experiments. In the first experiment, we study the effect 
of increasing the staleness bound on the coordination load. We report the ratio 
of the number of messages that the protocol sends for the bound under test over 
the number of messages that it sends for the base-line bound. (The base-line 
recency bound is the maximum weight of the calls. The baseline allows every 
single call to be buffered.) In the second experiment, we study the effect of 
increasing the staleness bound on the response time of each method. Finally, in 
the last experiment, we compare the response time of our protocol with the base- 
line recency, with the sequential consistency (SC). SC uses total-order broadcast 
for all the methods. 


Assessment. Figure 8(a) and (c) show the effect of increasing the staleness 
bound on the coordination load for the two use-cases. As the staleness bound 
is increased, the ratio of the messages sent by RB, TOB and P2P decreases. 
Figure 8(a) (bank account), shows 88% decrease in the number of messages sent 
to RB when the bound is increased from 20 to 200. Likewise, the TOB and 
P2P ratios decrease by 78% and 90%, respectively. In Fig. 8(c) (movie book- 
ing), buffering helps to reduce TOB calls by 40% across the experiments. This 
decrease, however, unlike the bank account use-case, is steady over different 
bounds. This is because it is more difficult to “buffer” in the movie booking 
use-case. There are no S-conflicts in the bank account use-case and hence two 
out of two update methods can be buffered. However, S-conflicts in the movie 
use-case allow only 2 out of 4 update methods to be buffered: increaseSpace 
and specialReserve. Also, we observe that the number of RB and P2P messages 
decrease by at most 10%. 
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the book operation cannot be buffered 

due to the S-conflict with other methods and has to be synchronized. On the 
other hand, the response time of the specialReserve method decreases by 33% 
when the bound is increased from 2 to 20. The reason is that it has a self- 
conflict and if it cannot be buffered, it should be synchronized by the TOB and 
TOB incurs a high coordination overhead. Therefore, as buffered calls increase 
and the use of TOB decreases, the response time is significantly improved. The 
response time of the increaseSpace method also benefits from recency awareness; 
it decreases by 72%. The methods book and cancelBook have conflicts. In the 
blocking protocol that HAMPA uses, the method book handles synchronization; 
therefore, the method cancelBook just broadcasts the request. As the recency 
bound is increased, the network is less crowded and therefore, the response time 
of cancelBook is decreased. 

Figure 9 compares the response time of recency-aware objects with the base- 
line bound with the sequentially consistent objects. The SC protocol synchro- 
nizes all the calls and orders them with respect to each other. However, HAMPA 
minimizes coordination while preserving convergence, integrity and recency. We 
observe that the response time speedup is in average as high as 2x and 1.8x 
for the bank account and movie use-cases respectively. More experiments are 
available in the appendix [5]. In particular, they show that the runtime cost of 
SMT solving is only 0.2% to 1% of the average response time. 


Res. Time (ms) 
rFPwonNeo 


8 Related Work 


Epsilon serializability [46] allows concurrent execution of updates with queries 
and bounds the difference of the inconsistent values that are observed in these 
executions and the consistent values that would be observed in a serializable exe- 
cution. In contrast, HAMPA preserves the integrity of the state, bounds staleness, 
allows different orders in different replicas, and formally defines the difference 
for relational operators. 
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In TACT [54-58], operations return tentative values; they might be eventu- 
ally reordered to preserve strong consistency. TACT bounds the numeric error 
between the tentative and final return values. The user specifies the granularity 
of the bounded object “conit” and the strength of the protocol. On the other 
hand in HAMPA, the states are final and enjoy integrity provided on top of weak 
consistency. Further, the staleness bound with respect to the pending future 
state is automatically optimized with static and dynamic analyses. 

In AQuA [31], given a query and a staleness bound, the master server dynam- 
ically selects a recent enough server to service the query. Similarly, TRAPP [43] 
finds recent enough servers for different parts of data that are needed for the 
query. FRACS [59] allows operations to be buffered at replicas up to a given 
threshold. In contrast to HAMPA, these projects do not guarantee integrity and 
convergence, and do not automatically infer the staleness bounds. PIQL [6] 
bounds the number of key-value store operations for each query trading the 
precision of the result for performance. However, it does not consider the stale- 
ness of replicas. 

To reduce synchronization, PBS [9] communicates with only a partial quorum 
of replicas to bring a total order to operations, and probabilistically bounds the 
staleness of the observed states. In contrast, HAMPA performs synchronization 
with full quorums but only for conflicting calls, and allows different orders for 
replicas. Further, it analyzes and synthesizes replicated objects and supports 
relational in addition to single-key operations. 

The trade-off between consistency and latency presented as PACELC [1] 
aligns with our experiments. As the consistency decreases (staleness bound 
increases), the latency decreases (responsiveness increases). Warranties [38] and 
Homeostasis [47] allow local updates if they keep the validity of certain assertions. 
Although other replicas can rely on the validity of the assertions, the staleness 
of their state is not bounded. In contrast, HAMPA maintains a staleness bound. 
Further, it exploits weak consistency and guarantees convergence. 


9 Conclusion 


This paper presented a relational object specification language that captures the 
integrity and recency requirements of the object. It presented a syntax-directed 
analysis that given a specification, infers optimum staleness bounds. In addi- 
tion, it presented the coordination avoidance conditions, operational semantics, 
a protocol and a synthesis tool for replicated systems that guarantee conver- 
gence, integrity and recency. The recency-aware protocol embeds a solver to 
decide whether coordination avoidance is safe and increases the responsiveness. 
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Abstract. Linearizability is the de facto correctness criterion for con- 
current data type implementations. Violation of linearizability is wit- 
nessed by an error trace in which the outputs of individual operations do 
not match those of a sequential execution of the same operations. Exten- 
sive work has been done in discovering linearizability violations, but little 
work has been done in trying to provide useful hints to the programmer 
when a violation is discovered by a tester tool. In this paper, we pro- 
pose an approach that identifies the root causes of linearizability errors 
in the form of code blocks whose atomicity is required to restore lin- 
earizability. The key insight of this paper is that the problem can be 
reduced to a simpler algorithmic problem of identifying minimal root 
causes of conflict serializability violation in an error trace combined with 
a heuristic for identifying which of these are more likely to be the true 
root cause of non-linearizability. We propose theoretical results outlin- 
ing this reduction, and an algorithm to solve the simpler problem. We 
have implemented our approach and carried out several experiments on 
realistic concurrent data types demonstrating its efficiency. 


1 Introduction 


Efficient multithreaded programs typically rely on optimized implementations 
of common abstract data types (ADTs) like stacks, queues, sets, and maps [31], 
whose operations execute in parallel across processor cores to maximize per- 
formance [36]. Programming these concurrent objects correctly is tricky. Syn- 
chronization between operations must be minimized to reduce response time 
and increase throughput [23,36]. Yet this minimal amount of synchronization 
must also be adequate to ensure that operations behave as if they were exe- 
cuted atomically, one after the other, so that client programs can rely on the 
(sequential) ADT specification; this de-facto correctness criterion is known as lin- 
earizability [24]. These opposing requirements, along with the general challenge 
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in reasoning about thread interleavings, make concurrent objects a ripe source 
of insidious programming errors [12,15,35]. 

Program properties like linearizability that are difficult to determine stati- 
cally are typically substantiated by dynamic techniques like testing and runtime 
verification. While monitoring linearizability of an execution against an arbitrary 
ADT specification requires exponential time in general [20], there exist several 
efficient approaches for dealing with this problem that led to practical tools, 
e.g., [3,4,13,14,16,33,39,47]. Although these approaches are effective at identi- 
fying non-linearizable executions of a given object, they do not provide any hints 
or guidelines about the source of a non-linearizability error once one is found. 
If some sort of root-cause for non-linearizability can be identified, for example a 
minimal set of commands in the code that explain the error, then the usability 
of such testing tools will significantly increase for average programmers. Root- 
causing concurrency bugs in general is a difficult problem. It is easy enough to 
fix linearizability if one is willing to disregard or sacrifice performance measures, 
e.g., by enforcing coarse-grain atomic sections that span a whole method body. 
It is difficult to localize the problem to a degree that fixing it would not affect 
the otherwise correct behaviours of the ADT. Simplifying techniques, such as 
equating root causes with some limited set of “bad” patterns, e.g., a non-atomic 
section formed of two accesses to the same shared variable [10,28,38] have been 
used to provide efficient coarse approximations for root cause identifications. 

In this paper, we present an approach for identifying non-linearizability root- 
causes in a given execution, which equates root causes with optimal repairs that 
rule out the non-linearizable execution and as few linearizable executions as 
possible (from a set of linearizable executions given as input). Our approach can 
be extended to a set of executions and therefore in the limit identify the root 
cause of the non-linearizability of an ADT as a whole. Sequential! executions of a 
concurrent object are linearizable, and therefore, linearizability bugs can always 
be ruled out by introducing one atomic section per each method in the ADT. 
Thus, focusing on atomic sections as repairs, there is a guarantee of existence of 
a repair in all scenarios. We emphasize the fact that our goal is to interpret such 
repairs as root-causes. Implementing these repairs in the context of a concrete 
concurrent object using synchronization primitives (eg., locks) is orthogonal and 
beyond the scope of this paper. Some solutions are proposed in [28,29,46]. 

As a first step, we investigate the problem of finding all optimal repairs in 
the form of sets of atomic sections that rule out a given (non-linearizable) execu- 
tion. A repair is considered optimal when roughly, it allows a maximal number of 
interleavings. We identify a connection between this problem and conflict seri- 
alizability [37], an atomicity condition originally introduced in the context of 
database transactions. In the context of concurrent programs, given a decompo- 
sition of the program’s code into code blocks, an execution is conflict serializable 


1 An execution is called sequential when methods execute in isolation, one after 
another. 


352 B. Cirisci et al. 


Shared variables: 
4 procedure pop() 


range: integer initialized to 0 5 t := range-1; 
items: array of objects 6 x := NULL; 
initialized to NULL 7 for i := t downto 1 { 
8 x := items[i]; 
1 procedure push(x) 9 items[i] := null; 
2 i := F&I (range); 10 if ( x != null ) break; } 
3 items[i] := x 1. return x; 


Fig. 1. A non-linearizable concurrent stack. 


if it is equivalent? to an execution in which all code blocks are executed in a 
sequential non-interleaved fashion. A repair that rules out a non-linearizable 
execution T can be obtained using a decomposition of the set of events in T 
into a set of blocks that we call intervals, such that 7 is not conflict serializable 
with respect to this decomposition. Each interval will correspond to an atomic 
section in the repair (obtained by mapping events in the execution to statements 
in the code). A naive approach to compute all optimal repairs would enumerate 
all decompositions into intervals and check conflict-serializabiliy with respect to 
each one of them. Such an approach would be inefficient because the number 
of possible decompositions is exponential in both the number of events in the 
execution and the number of threads. We show that this problem is actually 
polynomial time assuming a fixed number of threads. This is quite non-trivial 
and requires a careful examination of the cyclic dependencies in non conflict- 
serializable executions. Assuming a fixed number of threads is not an obstacle 
in practice since recent work shows that most linearizability bugs can be caught 
with client programs with two threads only [12,15]. 

In general, there may exist multiple optimal repairs that rule out a non- 
linearizable execution. To identify which repairs are more likely to correspond to 
root-causes, we rely on a given set of linearizable executions. We rank the repairs 
depending on how many linearizable executions they disable, prioritizing those 
that exclude fewer linearizable executions. This is inspired by the hypothesis 
that cyclic memory accesses occurring in linearizable executions are harmless. 

We evaluated this approach on several concurrent objects, which are varia- 
tions of lock-based concurrent sets/maps from the Synchrobench repository [21]. 
We considered a set of non-linearizable implementations obtained by modifying 
the placement of the lock/unlock primitives, and applied a linearizability test- 
ing tool called Violat [14] to obtain client programs that admit non-linearizable 
executions. We applied our algorithms on the executions obtained by running 
these clients using Java Pathfinder [44]. Our results show that our approach is 
highly effective in identifying the precise root cause of linearizability violations 
since in every case, our tool precisely identifies the root cause of a violation that 
is discoverable by the client of the library used to produce the error traces. 


2 Two executions are equivalent if roughly, they are the same modulo reordering state- 
ments that do not access the same shared variable. 
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2 Overview 


Figure 1 lists a variation of a concurrent stack introduced by Afek et al. [1]. 
The values pushed into the stack are stored into an unbounded array items; 
a shared variable range keeps the index of the first unused position in items. 
The push method stores the input in the array and it increments range using 
a call to an atomic fetch and increment (F&I) primitive. This primitive returns 
the current value of range while also incrementing it at the same time. The pop 
method reads range and then traverses the array backwards starting from the 
predecessor of this position, until it finds a position storing a non-null value. It 
also nullifies all the array cells encountered during this traversal. If it reaches the 
bottom of the array without finding non-null values, it returns that the stack is 
empty. 

This concurrent stack is not linearizable as witnessed by the execution in 
Fig. 2. This is an execution of a client with three threads executing two push 
and two pop operations in total. The push in the first thread is interrupted by 
operations from the other two threads which makes both pop operations return 
the same value b. The execution is not linearizable because the value b was 
pushed only once and it cannot be returned by two different pop operations. 

The root-cause of this violation is the non-atomicity of the statements at 
lines 8 and 9 of pop, reading items[i] and updating it to null. The stack is 
linearizable when the two statements are executed atomically (see [1]). 


push(b) ; _ r 
push(a); || 22 = bop || r3 = popQ; 
// Thread 2: push(b) 
i = F&l(range) //0 
lI Thread 1: push(a) 
i= F&l(range) // 1 
items[0] = b 
At Thread 2: pop() 
t=range-1//1 
x = items[1] // null 
items[1] = null 
x = items[0] // b II Thread 3: pop() 
t= range -1//1 
ef x = items[1] // null 
cf, items[1] = null 
t x = items[0] // b 
items[1] = a items[0] = null 
eater tematol = nui 


return b return b 


Fig. 2. A client program of the concurrent stack of Fig. 1 and one of its non-linearizable 
executions illustrate as a sequence of read/write events. 
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Our goal is to identify such root-causes. We start with a non-linearizable 
execution like the one in Fig. 2. The first step is to compute all optimal repairs 
in the form of atomic sections that disable the non-linearizable execution. There 
are two such optimal repairs for the execution in Fig. 2: (1) an atomic section 
containing the statements at lines 8 and 9 in pop (representing the root-cause), 
and (2) an atomic section that includes the two statements in the push method. 

These repairs disable the execution because each pair of statements is inter- 
leaved with conflicting? memory accesses in that execution. This is illustrated by 
the boxes and the edges in Fig. 2 labeled by cf: the boxes include these two pairs 
of statements and the edges emphasize the order between conflicting memory 
accesses. In Sect. 5, we formalize this by leveraging the notion of conflict serial- 
izability. The execution is not conflict-serializable assuming any decomposition 
of the code in Fig. 1 into a set of code blocks (transactions) such that one of 
them contains one of these two pairs. These repairs are optimal because they 
consist of a single atomic section of minimal size (with just two statements). We 
formalize a generic notion of optimality in Sect. 4 through the introduction of an 
order relation between repairs, defined as component-wise inclusion of atomic 
sections and compute the minimal repairs w.r.t. this order. 

At the end of the first phase, our approach produces a set of all such optimal 
(incomparable) repairs. To isolate one as the best candidate, we use a heuristic 
to rank the optimal repairs. The heuristic relies on the hypothesis that repairs 
which disable fewer linearizable executions are more likely to represent the best 
candidate for the true root-cause of a linearizability bug. 

For instance, the client in Fig.2 admits a linearizable execution where the 
first two threads are interleaved exactly as in Fig.2 and where the pop in the 
third thread executes after the first two threads finished. This is linearizable 
because the pop in the third thread returns the value a written by the push in 
the first thread in items[1] (this is the first non-null array cell starting from the 
end). Focusing on the two optimal repairs mentioned above, enforcing only the 
atomic section in the push will disable this linearizable execution. The atomic 
section in the pop, which permits this execution, is ranked higher to indicate it 
as the more likely root-cause. This is the expected result for our example. 

This ranking scheme can easily be extended to a set of linearizable executions. 
Given a set of linearizable executions, we rank optimal repairs by keeping track 
of how many of the linearizable executions each disables. 


3 Preliminaries 


We formalize executions of a concurrent object as sequences of events repre- 
senting calling or returning from a method invocation (called operation), or an 
access (read or write) to a memory location. Then, we recall the notion of lin- 
earizability [24]. 


3 As usual, two memory accesses are conflicting when they access the same variable 
and at least one of them is a write. 
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We fix arbitrary sets M and Y of method names and parameter /return values. 
We fix an arbitrary set O of operation identifiers, and for given sets M and V 
of methods and values, we fix the sets C = {o.call m(v) : m € M,v € V,o€ O} 
and R = {o.ret v : v € O,o € O} of call actions and return actions. Each call 
action o.call m(v) combines a method m € M and value v € V with an operation 
identifier o € O. A return action o.ret v combines an operation identifier o € O 
with a value v € V. Operation identifiers are used to pair call and return actions. 
Also, let L be a set of (shared) memory locations and A = {o.rd(x),o.wr(2) : 
o € O,x € L} the set of read and write actions. The operation identifier of an 
action a is denoted by op(a). 

We fix an arbitrary set T of thread ids. An event is a tuple (t,a) formed of 
a thread id t € T and an action a. A trace T is a sequence of events satisfying 
standard well-formedness properties, e.g., the projection of 7 on events of the 
same thread is a concatenation of sequences formed of a call action, followed by 
read/write actions with the same operation identifier, and a return action. Also, 
we assume that every atomic section (block) is interpreted as an uninterrupted 
sequence of events that correspond to the instructions in that atomic section. 

We define two relations over the events in a trace T: the program order relation 
po, relates any two events e; and e> of the same thread such that e} occurs 
before e> in 7, and the conflict relation cf, relates any two events e; and e2 
of different threads that access the same location, at least one of them being a 
write, such that eı occurs before ez in r. We omit the subscript r when the trace 
is understood from the context. 

Two traces 7, and 72 are called equivalent, denoted by 7, = 72, when po,, = 
po,, and cf,, = cf,,. They are called po-equivalent when only po,, = po,,. 

The projection of a trace 7 over call and return actions is called a history and 
denoted by h(r). A history is sequential when each call action c is immediately 
followed by a return action r with op(c) = op(r). A linearization of a history hı 
is a sequential history hə that is a permutation of hı that preserves the order 
between return and call actions, i.e., a given return action occurs before a given 
call action in hı iff the same holds in hg. 

A library L is a set of traces*. A trace T of a library L is linearizable if 
L contains some sequential trace whose history is a linearization of h(t). A 
library is linearizable if all its traces are linearizable°’. In the following, since 
linearizability is used as the main correctness criterion, a bug is a trace T that is 
not linearizable. 


4 Intuitively, this corresponds to running a concrete library under a most general client 
that makes an arbitrary number of invocations from an arbitrary number of threads. 

5 Linearizability is typically defined with respect to a sequential ADT. Here, we take 
the simplifying assumption that the ADT is defined by the set of sequential histories 
of the library. This holds for all concurrent libraries that we are aware of. 
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4 Linearizability Violations and Their Root Causes 


Given a non-linearizable library, our goal is to identify the root cause of non- 
linearizability in the library code. Let us start by formally describing the state 
space of all such causes and state some properties of the space that will aid 
the understanding of our algorithm. First, our focus is on a specific category 
of causes, namely those that can be removed through the introduction of new 
atomic code blocks to the library code without any other code changes. 


Definition 1 (Non-linearizability Root Cause). For a non-linearizable 
library L, the root cause is formally identified by R, a set of atomic blocks A 
such that L is linearizable with the addition of blocks from A. 


Observe that the set of atomic blocks identified in Definition 1 can concep- 
tually be viewed as blocks of code whose non-atomicity is the root cause of 
non-linearizability and their introduction would repair the library. For the rest 
of this paper, we use the two terminologies interchangeably since for this specific 
class, the two notions perfectly coincide. The immediate question that comes to 
mind is whether Definition 1 is general enough. Observe that since linearizability 
is fundamentally an atomicity type property for individual methods in a library, 
if every single method of the library is declared atomic at the code level, then 
the library is trivially linearizable. The only valid executions of the library are 
the linear (sequential) executions in this case. Therefore, 


Remark 1. Every non-linearizable library can be made linearizable by adding 
atomic code blocks in R according to Definition 1. 


Since there always is a trivial repair, one is interested in finding a good one. 
The quality of a repair is contingent on the amount of parallelism that the 
addition of the corresponding atomic blocks removes from the executions of 
an arbitrary client of the library. Generally, it is understood that the fewer 
the number of introduced atomic blocks and the shorter their length, the more 
permissive they will be in terms of the parallel executions of a client of this 
library. This motivates a simple formal subsumption relationship between repairs 
of a bug. We say an atomic code block b subsumes another atomic code block 
b’, denoted as b 2, b', if and only if b’ is contained within b. 


Definition 2 (Repair Subsumption). A repair R subsumes another repair 
R', we write R I. R’ if and only if for all atomic blocks b! € R’, there exists an 
atomic block b E€ R such that b 3, b'. 


It is easy to see that 1, is a partial order, and combined with the finite set of 
all possible program repairs gives rise to the concept of a set of optimal repairs, 
namely those that do not subsume any other repair. It can be lifted to sets of 
repairs in the natural way: R 2, R’ iff VR’ ER’, ARERR: RIR. 


Remark 2. The set of traces of a library L with a repair R is a superset of the 
set of traces of L with the repair R’ if R’ Ie R. 


Root Causing Linearizability Violations 357 


This means that an optimal repair identification according to Definition 2 
should lead to an optimal amount of parallelism in the library repaired by forcing 
the corresponding code blocks to execute atomically. The goal of our algorithm 
is to identify such a set of optimal repairs. 

Now, let us turn our attention to an algorithmic setup to solve this problem. 
The non-linearizability of a library L is witnessed by a non-empty set of non- 
linearizable traces T. These are the concrete erroneous traces of (a client of) the 
library, for which we intend to identify the repair. 

Note that if 7 is a non-linearizable trace, then all the traces 7’ that are 
equivalent to 7 are also non-linearizable. Indeed, if 7’ is equivalent to 7, then the 
values that are read in 7’ are the same as in 7°, which implies that the return 
values in 7’ are the same as in 7, and therefore, 7’ is non-linearizable when 7 is. 

Consider a conceptual oracle, OŁ (T), that takes a set of non-linearizable 
traces of a library L and produces the set of all optimal repairs R such that 
each R € R excludes all the traces that are equivalent to those in T. Then the 
following iterative algorithm produces R for a library L: 


1. Let T=@ and R= Í. 
2. Check if L with the addition of atomic blocks from R is linearizable: 
— Yes? Return R. 
— NO? Produce a set of non-linearizability witnesses T” and let T = TUT". 
3. Call OŁ(T) and update the set of repairs R with the result. 
4. Go to back to step 2. 


Proposition 1. The above algorithm produces an optimal set of repairs R that 
make its input library linearizable. 


It is easy to see that if oracle OŁ (T) can be relied on to produce per- 
fect results, then the algorithm satisfies a progress property in the sense that 
Rk+1 Je Rk, where Rẹ is the value of R in the k-th iteration of the loop. Fol- 
lowing Remark 1, this chain of increasingly stronger repairs is bounded by the 
specific repair in which every method of the library L has to be declared atomic. 
Therefore, the algorithm converges. The assumption of optimality for O"(T) 
implies that on the iteration that the algorithm terminates, it will produce the 
optimal R. 

Note that in oracle OŁ, the focus shifts from identifying the source of error 
for the entire library to identifying the source of error in a specific set of non- 
linearizability witnesses. First, we propose a solution for implementing OŁ for 
a singleton set, i.e. precisely one error trace, and later argue why the solution 
easily generalizes to finitely many error traces. 


4.1 Repair Oracle Approximation 


Given a trace 7 as a violation of linearizability, we wish to implement O” that 
takes a single trace T and proposes an optimal set of repairs for it. 


6 We assume that program instructions are deterministic, which is usually the case. 


358 B. Cirisci et al. 


Observe that if every trace of L is conflict serializable [37] (i.e., equivalent to 
a sequential trace), assuming method boundaries as transaction boundaries, then 
it is necessarily linearizable. Therefore, knowing that it is not linearizable, we can 
conclude that there exists some trace of L which is not serializable. Following the 
same line of reasoning, we can conclude that the error trace 7 itself is not conflict 
serializable, for some choice of transaction boundaries. This observation is the 
basis of our solution for approximating repairs for non-linearizability through an 
oracle that is actively seeking to repair for non-serializability violations. 


Definition 3 (Trace Eliminator). For an error trace (a bug) T, a set of 
atomic blocks R is called a trace eliminator if and only if every trace that is 
equivalent to T is not a trace of the new library with the addition of blocks from 
R. 


Any trace eliminator that removes 7 as a valid trace of a client of the library L 
(and all the traces equivalent to T), by amending the library for the conflict serial- 
izability violation, (indirectly) eliminates it as a witness to non-linearizability as 
well. Note that the universes of trace eliminators and non-linearizability repairs 
are the same set of objects, and therefore the subsumption relation J, is well 
defined for trace eliminators, and the concept of optimality is similarly defined. 
Moreover, Definition 3 is agnostic to linearizability and can be interchangeably 
used for serializability repairs. 


Theorem 1. R is a trace eliminator for T if and only if T is not conflict seri- 
alizable with transaction boundaries that subsume R (statements that are not 
included in the atomic sections from R are assumed to form singleton transac- 
tions). 


Proof. (Sketch) For the if direction, assume by contradiction that R is not a 
trace eliminator for r. This implies that there exists a trace 7’ = 7 where the 
sequences of events corresponding to the atomic sections in R occur uninter- 
rupted (not interleaved with other events). This is a direct contradiction to T 
not being conflict serializable when transaction boundaries are defined precisely 
by the atomic sections in R. For the only if direction, assume by contradiction 
that 7 is conflict serializable. By definition, there is an equivalent trace 7’ where 
the sequences of events corresponding to the atomic sections in R occur unin- 
terrupted. Therefore, the library L’ obtained by adding the atomic code blocks 
in R admits 7’, which contradicts the fact that R is a trace eliminator for T. 


The relationship between the set of trace eliminators for 7 and O"(r) can be 
made precise. Since every trace eliminator is a linearizability repair by definition, 
but not necessarily an optimal one, we have: 


Proposition 2. Let O (T) represent the optimal set of repairs that eliminate T 
as a witness to non-linearizability and R be the set of optimal trace eliminators 
fort. We have R D OF (r). 
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This is precisely why the set of trace eliminators safely overapproximates 
the set of linearizability repairs for a single trace. Note that Theorem 1 links 
any trace eliminator (a set of code blocks) to a collection of dynamic (runtime) 
transactions. It is fairly straightforward to see that given the latter as an input, 
the former can be inferred in a way that the dynamic transactions generated by 
the static code blocks are as close as possible to the input transaction boundaries, 
assuming no structural changes occur in the code. In Sect. 5, we discuss how an 
optimal set of dynamic transaction boundaries can be computed, which give rise 
to a set of optimal trace eliminators. 


4.2 Generalization to Multiple Traces 


If we have an implementation for an oracle OŁ (r) that takes a single trace and 
produces the set of optimal trace eliminators for it, then the following algorithm 
implements an oracle for O({7,,...,7}) for any finite number of traces: 


- Let R= 90. 

— For each 7; (1 <i < n): let R; = OŁ (7;). 

— Let T=R, x.x Ry. 

— For each T € T: let R = RU flatten(T). 

— For each R €E R: if IR’ € R s.t. RI. R’ then R= R- {R}. 


where flatten(T) basically takes the union of repairs suggested by individual 
components of 7 while merging any overlapping atomic blocks. Note that the ith 
component of T suggests an optimal trace eliminator for 7;. If we want a tight 
combination of all such trace eliminators, we need the minimal set of atomic 
blocks that covers all atomic blocks suggested by each eliminator. Formally: 


flatten((R1,...,Rn)) = smallest R wrt J. st VI<i<n: R e Ri 


we can then conclude: 


Theorem 2. If O"(r) produces the optimal set of trace eliminators for trace 
T, then the above algorithm correctly implements O' ({11,...,T}), that is, it 
produces the optimal set of repairs for the set of error traces {11,...,Tn}. 


5 Conflict-Serializability Repairs 


In this section, we investigate the theoretical properties of conflict serializability 
repairs to provide a set up for an algorithm that implements the oracle OŁ for a 
single input trace. The goal of this algorithm is to take a trace 7 as an input and 
return the optimal trace eliminator for 7, under the assumption that 7 witnesses 
the violation of linearizability. 
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5.1 Repairs and Conflict Cycles 


We start by introducing a few formal definitions and some theoretical connections 
that will give rise to an algorithm for identifying an optimal set of atomic blocks 
that can eliminate a trace 7 as a witness to violation of conflict serialiazability. 


Definition 4 (Decompositions and Intervals). A decomposition of a trace 
T is an equivalence relation D over its set of events such that: 


— D relates only events of the same operation, i.e. if (e1,e2) E D, then op(e1) = 
op(e2), and 

— the equivalence classes of D are continuous sequences of events of the 
same operation, i.e., if (e1,e3) E€ D and {(e1,€2),(e2,e3)} C po,, then 
{(e1, €2), (€2,e3)} © D 


The equivalence classes of a decomposition D, denoted by I- p are called inter- 
vals. 


Observe that the relation J, is well defined partial order over the universal 
all possible intervals (of all possible decompositions) of a trace T. 


Definition 5 (Interval Graphs). Given a trace T, and decomposition D, an 
interval graph is defined as G, p = (V, E) where the set of vertexes V is the set 
of intervals of D and the set of edges E is defined as follows 


E={(i,i)|i4a necie ci’: (e,e’) € po, Ucf, 
cii 


Since, by definition, each edge in the interval graph is induced by an edge 
from either relation po, or cf+, but note both, we lift these relations over the 
sets of intervals in the natural way, that is: 


(ii) ech <> Jeci,e ci’: e#e'A(e,e) E cfr 


(ii) Ep => Jeci ed ci’: e#e' A(e,e') € po, 
Given an interval graph edge (i, i’) € cft U poż, let 


tre(i,i’) = {(e,e’) |e Eide Ei’ A (ee) € cf, Upo,} 
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Figure 3 illustrates an interval graph. ty t t3 
Node o; : ej denotes an event ej 
of operation o;. Events of the same 
thread are aligned vertically. We draw 
only cf, edges since the po, edges are 
implied by the vertical alignment of events. 
Non-singleton intervals of D are ip = 
{e€1,€2,€3,€4}, i2 = {€5,e6} and iz = 
{e7,eg}. Singleton intervals are identi- 
fied by the corresponding event identi- 
fiers. Edges among interval nodes cor- 
respond to cf, or po,. For instance, 
(41, %2) € cfi since (e1,€6) € cfr, & € 
7, and eg € tg. As an example for 
the function tre, we have tre(i2,i3) = 
{(es, e7), (€5, es), (e6, €7), (€6, eg) } that con- 
sists of po, edges and tre(iz,i;) = 
{(es, e3), (es, e4)} that consists of cf, edges. Fig. 3. An interval graph. 

For the degenerate decomposition in 
which each event is an interval of size one by itself, the interval graph collapses 
into a trace graph, denoted by G+. Note that G, is acyclic since the relations 
po, and cf, are consistent with the order between the events in 7. 

Intervals are closely related to the static notion of transactions and the 
induced transaction boundaries on traces. For example, in the decomposition 
in which the intervals coincide with the boundaries of transactions (e.g. method 
boundaries), it is straightforward to see that the interval graph becomes precisely 
the conflict graph [19] widely known in the conflict serializability literature. It is 
a known fact that a trace is conflict serializable if and only if its conflict graph 
is acyclic [37]. Since 7 is not conflict serializable with respect to the boundaries 
of methods from L, we know the interval graph with those boundaries is cyclic. 

With intervals set as single events, G, is acyclic, and with the intervals 
set at method boundaries, it is cyclic. The high level observation is that there 
exist a decomposition D in the middle of this spectrum, so to speak, such that 
G-p is cyclic, but G, p for any D 3e D’ is acyclic. In the following we will 
formally argue why such a decomposition D is at the centre of identification of 
serializability repairs. 

A cycle in a graph is simple if only one vertex is repeated more than once. 


Definition 6 (Critical Segment Sets). Let D be a decomposition such that 
the interval graph G-p is cyclic and a = io ...in—1i0 be a simple cycle. Define 
edges(a) = tre(io, i1) x tre(iz, i2) X +++ X tre(in_1, tg) 
segs(2) = {[e2,e2] | OS k < n= 1A (C28 44) mod n) = Zk} 


critSegs(2) = {[e2,€2] € segs(2) | (€2,€2) € po,} 
CritSegs(a) = {s | IZ € edges(a): s = critSegs(é)} 
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where the set CritSegs(a) is the set of all critical segments sets of cycle a. 


Note that each cycle may induce several different segment sets, determined 
by |edges(a)|. More importantly, each segment set includes at least one critical 
segment. 


Lemma 1. For any Ë € edges(a), we have critSegs(é) # 0. 


Example 1. In Fig.3, a1 = i1, i2, i3, i1 is a simple cycle. Included in edges(qa) are 
the following three cycles and their corresponding segments: 


ay = ((e1, €6), (€6, €7)(€8, €3)) segs(az) = {[e1, es], [e6, ec], [es, e7]} 
at = ((e1, e6), (€6, €7), (€s, €4)) segs(a3) = {[e1, e4], [ee, e6], [es, €7]} 
at = ((e1,€6), (€5,€8),(€s,e3))  segs(at) = {[e1, e3], [es, e6], [es, es].} 


The critical segments for these are critSegs(at) = {[e1,e3]}, critSegs(a?) = 
{[e1, e4] } and critSegs(a?) = {[e1, e3], [es, ee] }- 


There is a direct connection between the notion of critical segment sets and 
conflict serializability repairs that the following lemma captures. A segment is 
called uninterrupted in a trace T when all its events occur continuously one after 
another in 7 without an interruption from events of another interval. 


Lemma 2. Let a be a cycle in some interval graph G;,p of trace T which is 
not conflict serializable wrt to the decomposition D and critSega E€ CritSegs(a). 
There does not exist trace T’ which is equivalent to T in which all segments from 
critSega are uninterrupted in T'. 


The immediate corollary of Lemma2 is that if one ensures the atomicity of 
the segments of events in CritSegs(a) by adding atomic blocks at the code level, 
then 7 can no longer be an execution of the library. In other words, a set of such 
atomic code blocks is precisely a trace eliminator (Definition 3) for 7. 


5.2 A Simple Algorithm 


Lemma 2 and its corollary suggest a simple enumerative algorithm to discover 
the set of all trace eliminators for a buggy trace T. 


— Let D be the set of all decompositions of r and R = @. 
— For each D € D: 
e Let C be the set of all simple cycles in G;p. 
e For each a € C: 
x Let S = CritSegs(a). 
x R=RUS 
— For each RER: 
e FIRER: RI. FR then R=R-—{R}. 
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Theorem 3. The above algorithm produces the optimal set of trace eliminators 
for a buggy trace T. 


This theorem is non-trivial, because the set of cycles considered are limited 
to simple cycles and an argument is required for why no optimal solution is 
missed as the result of this limitation. An important point is that any optimal 
trace eliminator R defines a decomposition D where the non-singleton intervals 
are precisely those defined by R such that G,,p contains a simple cycle a and 
the set of code blocks in R is a member of CritSegs(a). Note that the algorithm 
may end up producing non-ideal solutions in the first loop, and the proof of 
Theorem 3 relies on the argument that all such solutions will be filtered out by 
a proper solution that guarantees to exist and subsume them. 


Example 2. The first loop of the above algorithm includes in R the trace elimi- 
nators induced by the critical segments mentioned in Example 1. After the last 
loop, however, only critSegs(at) = {[e1, e3]} will remain in R since the other two 
are subsumed by it. 


The algorithm is obviously very inefficient. There are two levels of enumer- 
ation: all decompositions and all cycles of each decomposition. Assuming that 
there are O(|po,|) events in an operation, then there are O(2!P°|) different 
decompositions for it. Assuming that there are O(|T|) operations, we conclude 
that |D| = O(2!P°-!I7|). There could be O(2!£*!) possible cycles for each decompo- 
sition where E, = po, Ucf,. Therefore, the first loop may generate O(2?!"7 ITI) 
many repairs. The last loop iterates over R and each repair takes O(R) time. 
The algorithm operates in time O(2*!7!I7!). It is exponential both in the size of 
threads set and the graph. There are many redundancies in the output of the 
first loop, however. These are exploited to propose an optimized version of this 
algorithm. 


5.3 A Sound Optimization 


Consider an arbitrary cycle æ in the interval graph G’,_p. If we want to trace the 
cycle a over the trace graph G+, we would potentially need additional edges that 
would let us go against the program order inside some intervals that appear on 
a. Let us call the graph extended with such edges GP. Formally, GP includes 
all the nodes and edges from a trace graph and incorporates additional edges 
between the events of each interval of D to turn it into a clique’ which is by 
definition a strongly connected and therefore accommodates the connectivity of 
any event of an interval to another event in it. 

The converse also holds, that is, every simple cycle with at least one conflict 
edge in the GP with the aforementioned additional edges corresponds to a cycle 
in the interval graph G,,p. Note that the inclusion of at least one conflict edge 
is essential, since every interval graph cycle always includes one such edge by 
default; since the program order relation is acyclic. Formally: 


T A clique is a complete subgraph of a given graph. 
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Lemma 3. For each simple cycle a of Gp, there exists a simple cycle a’ of 
GP that contains at most two events from each interval in a. 


The above lemma can immediately be generalized. Consider the graph GM 
where M indicates the decomposition whose intervals coinciding with the library 
method boundaries. Since for any arbitrary decomposition D, we have M 3e 
D, we can conclude that GM includes all possible additional edges that one 
may want to consider as part of a cycle in an arbitrary G? for an arbitrary 
decomposition D. Hence, the set of edges of G™ is a superset of the set of edges 
of all graphs G? for all D. This immediately implies that the set of cycles of 
GM is the superset of the set of cycles of all such graphs. This fact, combined 
with Lemma 3 leads us to the new simplified algorithm below in place of the one 
in Sect. 5.2: 


— Let R = Í. 
— Let C’ be the set of all simple cycles in GM. 
— For each a € C: 
e Let S = critSegs(a). 
e R=RUS 
— For each R € R: 
o FIRER: RIR then R=R-— {R}. 


Note that we are slightly bending the definition of critSegs in the above 
algorithm, compared to the one given in Definition 6 since the input cycle there 
is formally a tuple, and here itis simply a list. The function is semantically the 
same, however and therefore we do not redefine it. 

Observe that ever cycle of GM corresponds to a cycle in some graph G? for 
some decomposition D. This observation together with Lemma 3 and Theorem 3 
implies the correctness of the above algorithm. Every cycle of every GP is covered 
by the algorithm, and conversely every cycle considered is valid. 

We can simplify the above algorithm one step further by further limiting the 
set of cycles C’ that need to be enumerated. In graph theory, a chord of a simple 
cycle is an edge connecting two vertices in the cycle which is not part the cycle. 


Theorem 4. The above algorithm produces the set of optimal trace eliminators 
Jorr if C' is limited to the set of simple chordless cycles of GM. 


Theorem 4 makes a non-trivial and algorithmically subtle observation. Enu- 
merating the set of all simple chordless cycles of G? is a much simpler algorithmic 
problem to solve compared to the initial one from Sect. 5.2. Lemma3 supports 
part of this argument since it ensures that all repairs explored in the algorithm 
from Sect. 5.2 are also explored by the above algorithm. For Theorem 4 to hold, 
one needs to additionally argue that the cycles of GĦ do not produce any junk, 
that is, each cycle’s critical segments correspond to a valid trace eliminator for 
T. Also, as for simple cycles, CritSegs(a) for a cycle a subsumes CritSegs(a’) for 
any chordless cycle a’ included in a. In Sect. 6.1, we present an algorithm that 
solves the problem of enumerating all cycles in C’ effectively. 
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6 Repair List Generation 


In this section, we first start by giving a detailed algorithm that produces the 
set of all optimal trace eliminators. These repairs suggest incomparable optimal 
ways of removing an erroneous trace from the library. We then present a novel 
heuristic that orders this set into a list such that the the ones ranked higher in 
the list are more likely to correspond to something that a human programmer 
would identify (amongst the entire set) as the ideal repair. 


6.1 Optimal Repairs Enumeration Algorithm 


In this section, we present an algorithm for enumerating all simple chordless 
cycles in GM with at least one cf, edge, prove its correctness, and formally 
analyze its time complexity. The algorithm is the following: 


= Let C=90. 

— For each sequence @ = c1, C2, . . . ,Cn where c; € cf; and 0 < n < |T]: 
e Let c; = (e8 e) for all i € [1, n]. 

e If (ep eg morda) S EM\cf, and e? + eF s.t. i, j € [L, n] s.t. i Æ j: 
x C= CU {a} 


It enumerates all non-empty cf, sequences of length less than or equal to |T|. 
If the sequence forms a valid simple cycle and visits each thread at most once 
(i.e. there are no two distinct conflict edges such that its end points are on the 
same thread), then it is added to the result set C. Correctness of the algorithm 
relies on the following observation: 


Lemma 4. a is a chordless cycle of GP with at least one cf, edge if and only 
if a visits each thread at most once and it visits at least two threads. 


As a corollary of Lemma4, we know ty to t3 tn-1 th 
that a chordless cycle could have at most 
|T| conflict edges. Otherwise, by the pigeon 
hole principle, at least two conflict edges 
end up in the same thread. Therefore, 
the algorithm can soundly enumerate only 
sequences of cf, edges of length less than 
or equal to |T|. Moreover, the choice of 
cf, determines the rest of the edges in 
the cycle. Therefore, there are at most 
O(|cf,|!"!) chordless cycles with at least one 
cf, edge of a graph GP. 

Note that, in general, the number of 
simple cycles can be exponential in the 
number of edges. This means that enumer- 
ating only chordless cycles reduces the size 
asymptotically. In other words, our pro- 
posed sound optimization of Sect. 5.3 is at 
the roof of the polynomial complexity results presented here. 


Fig.4. GY with |cf,|'"! chordless 
cycles. (Color figure online) 
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Interestingly, this upper bound is not loose. There is a class of traces 
parametrized by |T| such that the number of chordless cycles with at least one 
cf, edge is |cf,|/™!. Let T = {t,,...,tn} be the set of threads and G™ has k par- 
allel conflict edges between t; and t(j mod n)+1 for all i € [1, n]. Moreover, conflict 
edges that start from t; is above the conflict edges that end at t; in terms of 
program order. This graph is depicted in Fig.4. To form a cycle, one needs to 
pick one of k edges between t; and t(¢ mod n)+1 for all i € [1, n]. So, there are k” 


| 
cycles. Since k = te there are (Sr) chordless cycles with a conflict edge. 


If we consider |T| as a constant, there are (2(|cf,|!™!) chordless cycles with at 
least one cf; edge. We are finally ready to state the main complexity result: 


Theorem 5. Above enumeration algorithm generates all chordless cycles with 
at least one cf, edge of GP in O((|po,| + |cf+|)|cf+|T!) time. 


Proof. The loop enumerates all the cf, sequences of length at most |T| in 
O(|cf,|!") time. For each such sequence, it takes O(|po,| + |cf+|) time to check 
whether this sequence forms a cycle (if each consecutive conflict edges are con- 
nected through a EM\cf, edge) and whether it visits a thread more than once. 
As a consequence, the above bound holds. 


Lastly, there may be as many optimal repairs as there are chordless cycles in 
GM. Consider the class of traces depicted in Fig. 4. Each chordless cycle with at 
least one cf, edge has exactly n critical segments (illustrated in red). Consider 
two distinct chordless cycles a; and ag. There exists a thread t; such that there is 
a different edge between t; and ti mod n)+1 in a1 compared to ag. Without loss of 
generality, assume that the corresponding edge of a; has source and destination 
events that appear before the source and destination events of the corresponding 
edge of a2 in program order (po,). Then, a; has a larger critical segment on 
t; and smaller critical segment in ¢(; mod n)+1 Compared to a2. Therefore, the 
neither critical segment subsumes the other. Therefore, each chordless cycle with 
at least one cf, edge produces an optimal repair. 

This implies that the bound presented in Theorem5, namely O((|po,| + 
\cf,|)|cf,|!"!), applies any other algorithm that outputs all optimal repairs. 


6.2 Ranking Optimal Repairs 


We argued through the example in Sect.2 and a formal statement in Sect. 4.1 
that not every eliminator of a buggy trace 7 is an optimal root cause for non- 
linearizability. All that we know is that they are all optimal trace eliminators. 
As a heuristic to identify optimal linearizability repairs out of a set of trace 
eliminators, we rely on another input in the form of a set I" of linearizable exe- 
cutions, and rank trace eliminators depending on how many linearizable traces 
from I they disable, giving preference to trace eliminators that disable fewer 
ones. This heuristic relies on an experimental hypothesis that there are harmless 
cyclic dependencies that occur in linearizable executions. 

Given a buggy trace 7, and a set I’ of linearizable traces, we use the following 
algorithm to rank trace eliminators for T: 
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— Let R be the set of optimal trace eliminators for 7 
— For each R € R: 

e Let f(R) = Hr eT :R is a trace eliminator for 7’}| 
— Sort R in ascending order depending on f(R) with R € R. 


Since the above algorithm is heuristic in nature, there are no theoretical 
guarantees for the optimality of its results. For instance, its effectiveness depends 
on the set of linearizable traces I" given as input. We discuss the empirical aspects 
of the underlying hypothesis in more detail in Sect. 7. 


7 Experimental Evaluation 


We demonstrate the efficacy of our approach for computing linearizability root- 
causes on several variations of lock-based concurrent sets/maps from the Syn- 
chrobench repository [21]. We consider three libraries from this repository: two 
linked-list set implementations, with coarse-grain and fine-grain locking, respec- 
tively, and a map implementation based on an AVL tree overlapping with two 
singly-linked lists, and fine-grain locking. We define three non-linearizable vari- 
ations for each library by shrinking one atomic section only in the add method, 
only in the remove method, or an atomic section in each of these two methods. 
For each non-linearizable variation, we use Violat [14] to randomly sample three 
library clients that admit non-linearizable traces*®. We use Java Pathfinder [44] to 
extract all traces of each client, up to partial-order reduction, partitioning them 
into linearizable and non-linearizable traces. Traces are extracted as sequences 
of call/return events and read/write accesses to explicit memory addresses, asso- 
ciated to line numbers in the source code of each of the API methods. The latter 
is important for being able to map critical segments (which refer to events in a 
trace) to atomic code blocks in the source code. 

In Table 1, we list some quantitative data about our benchmarks, the clients, 
and the non-linearizable variations identified by the line numbers of the mod- 
ified atomic sections (the original libraries can be found in the Synchrobench 
repository). For instance, the first variation of RWLockCoarseGrainedListIntSet 
is obtained by shrinking the atomic section in the add method between lines 
(26, 32/35] to [32, 32/35] (there are two line numbers for the end of the atomic 
section because it ends with an if conditional). 

For each non-linearizable trace 7 of a client C, we compute the set of optimal 
trace eliminators for T using the algorithm in Sect. 5.3 with the cycle enumeration 
described in Sect.6.1. We then compute the ranking of these trace eliminators 
using as input the set of linearizable traces of C (the restriction to lineariz- 
able traces of the same client is only for convenience). Note that multiple trace 
eliminators can be ranked first since they disable exactly the same number of 
linearizable traces. Also, note that an optimal root-cause can disable a number 


8 These linearizability violations are quite rare. The frequencies reported by Violat in 
the context of a fixed client (when using standard testing) are in the order of 1/1000. 
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Table 1. Benchmark data. Column Lib. shows the transformation on the atomic 
section(s) of the original library (we write atomic sections as pairs of line numbers in 
square brackets), Client shows the clients (we abbreviate the names of add, remove, 
and contains to a, r, and c, resp.), Non-lin. Out. shows an outcome (set of return 
values) witnessing for non-linearizability (true, false, and null are abbreviated to T, F, 
and N, resp.), # bugs and # valid give the number of non-linearizable and linearizable 
traces extracted using Java Pathfinder, respectively, ## ev. and # conf. give the 
average number of events and conflict edges in these traces, Total(s) and Tr. Elim(s) 
give the clock time in seconds for applying our approach, the latter excluding the Java 
Pathfinder time for extracting traces. 


RWLockCoarseGrainedListIntSet (119 LOC) 


Id|Lib. Client Non-Lin Out.|# bugs|# valid] ev.|# conf.|Total(s)|Tr. Elim.(s) 

TI. cen T0); a(1)} I a} F, F, T, T l9 36 j i5 13 5 

ole sy a a(0); a(1)} || {a(0)} TIT 18 18 (6 Tl 4 

ae {a(0); a(0)} || ta(l)} TTT 9 2 3/8 Ti 7 

ne COMOMOMECO) T,F,T,T [9 27 fas [i0 12 4 

N COTONE COO E 18 54 {14 T2 3 

SEGOE CORON T ET: 9 18 39 9 Il 3 

7 ee Se {a(0); 2(0)} || {2(0); a(1)} + F Tie z b h d f 

8 |r: [47, 54/56] ad); r0)} I| 4a(0)} T.F, T 9 27 a f6 II 4 

9 |=[53, 54/56]|{c(0); a(1); r(1); c(0)} |] {c(}F, T, T, F, T[9 18 57 JH 12 3 

OptimisticListSortedSetWaitFreeContains (193 LOC) 

Id|Lib. Client Non-lin. Out.|# bugs|# valid|# ev.|# conf.|Total(s)|Tr. Elim.(s) 

a ey COM OSIECO): T,T,T 6 33 lea l6 |23 31 

2| e2 s276] (CORO) SECO) TET 1s 237 [o 3 17 8 

3 | 7: 92/56 ayy I] fet); alO)} TFT 6 is 4616 7 3 

TT. s, 50/82) CO: 20 0; (OF EOE, T, T, F, The 18 [61 N17 10 3 

BP Fro, 80/82] Ca): HOF Tl O; a0} T,T,T,T {12 ho |70 (25 22 5 

6 » PATE); a(0)} || (al); r(1)} T,T,T,T |6 12 83 T28 38 16 

7 la: [51, 52/56]|{a(0); a0); r(0)} || {r(0)} T,F, T, T l6 27 l6 |i7 18 6 

8 |[52, 52/56]|{a(0)} I| {a(0); r(1)} T, T, F 9 18 58 J9 T2 5 

g |=: (78, 80/82] {c(1); r(1); a(1)} || {a(1)} F,F,T,T |6 36 59 |8 11 5 

? |179, 80/82] iO): ey 

LogicalOrderingAVL (1092 TOO) 

Id|Lib. Client on-lin. Out.|# bugs|# valids|# ev.|# conf.|Total(s)|Tr. Elim(s) 
i eee {a(1,0)} || {a(l,1); 0D} LNT 6 51 93 M BI [24 
zS pen al ory, g COD OD fat): ao TFN e o e e e a 
3 , 269][271, 293] {r(0,0); r(1,0)} || {a(0,0); a(0,1)} |T, F, N, 0 J6 30 126 |42 100 22 
al T433, 45A {a(1,0); a(0,0); r(0,0)} || (1,0)} IN, N, F, T 19 75 152 |40 [593 [37 
5 |" a33. annie uak {a(1,0); r(1,0)} || {x(,0)} ,T,T 9 36 122 J31 77 3 
6 ; MOVED D; (0,1); r(0,0)} || CO,D} IN, T, F, T [9 36 138 33 |92 10 
7 la: [267, 293] {a(0,1); r(0,1)} || {a(0,0)} Dl 6 51 102 [1 19 i6 
8 |—[268, 269][271, 293] fr, a} II {a(1,0); a(I,1); a(1,0)} JT, N,0, N19 39 137 a [53 19 
9 ies rote sas) E&O: 9(0.1)} I {2(0.0); (0.0)} |P, 0, N, T Jo 51 uz hz jit læ 


of linearizable traces. This is true even for the ground truth repair (i.e. a repair 
that a human would identify trough manual inspection). 

The results are presented in Table 2 and are self-explanatory. In the majority 
of cases, the first elements in this ranking are atomic sections which are pre- 
cisely or very close to the expected results, i.e., atomic sections that belong to 
the original (error-free) version of the corresponding library. In some cases, the 
output of our approach is close, but not precisely the expected one. This is only 
due to the particular choice of the client used to generate the traces. In general, 
the quality of the produced repairs (compared to the ground truth) depends the 
types of behaviours of the library that the client exercises. However, if our tool 
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Table 2. Experimental data. Column #res gives the number of different results 
(sequences of trace eliminators) returned by our algorithm when applied on each of 
the non-linearizable traces of a client, and Tr. Elim. gives the first or the first two 
trace eliminators in the ranking obtained with our approach. For each trace eliminator 
we give the number of linearizable traces it disables (after —). 


RWLockCoarseGrainedListIntSet 


Id|Lib. # res.|Tr. Elim. 
: a: [26, 32/35 i ae s = 
= 132, 32/351 i = = LogicalOrderingAVL 
7 i E s 7 z IdlLib. res) Tr. Elim. 
Ar: 47, 54/56] H ised 5 i i 271, 279], [448, 451] | > 0 
247153, 54/56] H = a = 5 a: [267, 293] 265, 271] | — 27 
ESTE c > [268, 269][271, 293] 271, 279], 448, 451 |] > 0 
q |© eae Sally «| LE a >01] 430, 436 | | > 9 
—[32, 32/35 27, 35| | > 0 2 2 289 290/1 SO 
sh an aie I m, S>? 290, 293], [423, 430 |] +0 
asa Pa , 271, 279], |448, 451 | | > 0 
OptimisticListSortedSet WaitFreeContains 3 2 430, 436 ] 79 
: f 289, 290] | > 0 
Id|Lib. # res.|Tr. Elim. . 
SE Sal 290, 293], [423, 430] ] > 0 
1 2 13, 44], [55,56]] > 0) l4 go (CA6, 453 6 
a: [51, 52/56 as T0 451, 454], [423 ,436] | — 15 
z | 2152, 52/56], 51, 56] | > 15 5 |: (432, 454] 1 ae 2 =e 
55, 56] | > 0 6 |->[433, 434][436, 445] akties 
3 I 51, 56] | > 12 2 436, 450] | =0 
z T EET 430, 436] | + 0 
5 |e: (78, 80/82) |T 78 Bono 7 la: (267, 293] I 271, 279], [448, 451] | > 0 
es i [268, 269][271, 293] 271, 279], [448, 451] | > 0 
3/79, 80/82 78, 80| | > 6 8 : 1 eee 
6 2 Bad, BS SeT 30 r: [432, 454] 430, 436] ] — 30 
78 gol] > 6 g | 71433, 434][436, 445]] 7 271, 279], [448, 451] | + 0 
7 la: [bl, 52/56]|1 78, 80] | > 0 2b5 i201] | ee 
g | 7152, 52/56] 51, 56| | > 12 
r: [78, 80/82 55, 56] | > 0 
9 | —[79, 80/82]]1 51, 56] | > 27 


ranks repair R first, in the context of a client C, then after repairing the library 
according to R the client C produces no linearizability violations. 

The methods in the libraries OptimisticListSortedSet WaitFreeContains and 
LogicalOrderingAVL use optimistic concurrency, i.e., unbounded loops that 
restart when certain interferences are detected. This could potentially guide our 
heuristic in the wrong direction of giving the ground truth a lower rank. Indeed, 
a ground truth that concerns statements in the loop body could disable a large 
number of executions which only differ in the number of loop iterations. This, 
however, does not happen for small-size clients (like the ones used in our evalu- 
ation) since the number of invocations are bounded, which bounds the number 
of interferences and therefore the number of restarts. 

Optimistic concurrency has the potential to mess with the heuristic, but this 
does not happen in small bounded clients as witnessed by our blah benchmark 
that does just fine. 

To conclude, our empirical study demonstrates that given a good client (one 
that exercises the problems in the library properly), our approach is very effective 
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in identifying the method at fault and the part of its code that is the root cause 
of the linearizability violation. 


8 Related Work 


Linearizability Violations. There is a large body of work on automatic detec- 
tion of specific bugs such as data races, atomicity violations, e.g. [18,40, 41,45]. 
The focus of this paper is on linearizability errors. Wing and Gong [47] pro- 
posed an exponential-time monitoring algorithm for linearizability, which was 
later optimized by Lowe [33] and by Horn and Kroening [25]; neither avoided 
exponential-time asymptotic complexity. Burckhardt et al. [4] and Burnim et 
al. [5] implement exponential-time monitoring algorithms in their tools for test- 
ing of concurrent objects in .NET and Java. Emmi and Enea [14,15] introduce the 
tool Violat (used in our experiments) for checking linearizability of Java objects. 


Concurrency Errors. There have been various techniques for fault localiza- 
tion, error explanation, counterexample minimization and bug summarization 
for sequential programs. We restrict our attention to relevant works for concur- 
rent programs. More relevant to our work are those that try to extract simple 
explanations (i.e. root causes) from concurrent error traces. In [30], the authors 
focus on shortening counterexamples in message-passing programs to a set of 
“crucial events” that are both necessary and sufficient to reach the bug. In [27], 
the authors introduce a heuristic to simplify concurrent error traces by reducing 
the number of context-switches. Tools that attempt to minimize the number of 
context switches, such as SimTrace [26] and Tinertia [27], are orthogonal to the 
approach presented in this paper. To gain efficiency and robustness, some works 
rely on simple patterns of bugs for detection and a simple family of matching 
fixes to remove them, e.g., [10,28, 29,38]. Our work is set apart from these works 
by addressing linearizability (in contrast to simple atomicity violation patterns) 
as the correctness property of choice, and by being more systematic in the sense 
that it enumerates all trace eliminators for a given linearizability violation. We 
also present crisp results for the theoretical guarantees behind our approach and 
an analysis of the time complexity. Weeratunge et al. [46] use a set of good 
executions to derive an atomicity “specification”, i.e., pairs of accesses that are 
atomic, and then enforce it using locks. 

There is large body of work on synchronization synthesis [2, 6-8, 11, 22,34, 42, 
43]. The approaches in [11,42] are based on inferring synchronization by con- 
structing and exploring the entire product graph or tableaux corresponding to 
a concurrent program. A different group of approaches infer synchronization 
incrementally from traces [43] or generalizations of bad traces [7,8]. These tech- 
niques [7,8,43] also infer atomic sections but they do not focus on linearizability 
as the underlying correctness property but rather on assertion local violations. 
Several works investigate the problem of deriving an optimal lock placement 
given as input a program annotated with atomic sections, e.g., [9, 17,48]. Afix [28] 
and ConcurrencySwapper [7] automatically fix concurrency-related errors. The 
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latter uses error invariants to generalize a linear error trace to a partially ordered 
trace, which is then used to synthesize a fix. 


Linearizability Repairs. Flint [32] is the only approach we know of that 
focuses on repairing non-linearizable libraries, but it has a very specific focus, 
namely fixing linearizability of composed map operations. It uses a different 
approach based on enumeration-based synthesis and it does not rely on concrete 
linearizability bugs. 
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Abstract. We describe a technique for systematic testing of multi- 
threaded programs. We combine Quasi-Optimal Partial-Order Reduc- 
tion, a state-of-the-art technique that tackles path explosion due to 
interleaving non-determinism, with symbolic execution to handle data 
non-determinism. Our technique iteratively and exhaustively finds all 
executions of the program. It represents program executions using partial 
orders and finds the next execution using an underlying unfolding seman- 
tics. We avoid the exploration of redundant program traces using cutoff 
events. We implemented our technique as an extension of KLEE and eval- 
uated it on a set of large multi-threaded C programs. Our experiments 
found several previously undiscovered bugs and undefined behaviors in 
memcached and GNU sort, showing that the new method is capable of 
finding bugs in industrial-size benchmarks. 


Keywords: Software testing - Symbolic Execution - Partial-Order 
Reduction 


1 Introduction 


Advances in formal testing and the increased availability of affordable concur- 
rency have spawned two opposing trends: While it has become possible to ana- 
lyze increasingly complex sequential programs in new and powerful ways, many 
projects are now embracing parallel processing to fully exploit modern hard- 
ware, thus raising the bar for practically useful formal testing. In order to make 
formal testing accessible to software developers working on parallel programs, 
two main problems need to be solved. Firstly, a significant portion of the API 
in concurrency libraries such as libpthread must be supported. Secondly, the 
analysis must be accessible to non-experts in formal verification. Currently, this 
niche is mostly occupied by manual and fuzz testing, oftentimes combined with 


dynamic concurrency checkers such as ThreadSanitizer [45] or Helgrind [2]. 
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Data non-determinism in sequential and concurrent programs, and scheduling 
non-determinism are two major sources of path explosion in program analysis. 
Symbolic execution (10, 11,22, 29,38] is a technique to reason about input data in 
sequential programs. It is capable of dealing with real-world programs. Partial- 
Order Reductions (PORs) [5,19, 20,41] are a large family of techniques to explore 
a reduced number of thread interleavings without missing any relevant behavior. 

In this paper we propose a technique that combines symbolic execution and 
a Quasi-Optimal POR [35]. In essence, our approach (1) runs the program using 
a symbolic executor, (2) builds a partial order representing the occurrence of 
POSIX threading synchronization primitives (library functions pthread_*) seen 
during that execution, (3) adds the partial order to an underlying tree-like, 
unfolding [32,41] data structure, (4) computes the first events of the next partial 
orders to explore, and (5) selects a new partial order to explore and starts again. 
We use cutoff events [32] to prune the exploration of different traces that reach 
the same state, thus natively dealing with non-terminating executions. 

We implemented our technique as an extension of KLEE. During the evalua- 
tion of this prototype we found nine bugs (that we attribute to four root causes) 
in the production version of memcached. All of these bugs have since been con- 
firmed by the memcached maintainers and are fixed as of version 1.5.21. Our tool 
handles a significant portion of the POSIX threading API [4], including barriers, 
mutexes and condition variables without being significantly harder to use than 
common fuzz testing tools. 

The main challenge that our approach needs to address is that of scalability 
in the face of an enormous state space. We tackle this challenge by detecting 
whenever any two Mazurkiewicz traces reach the same program state to only 
further explore one of them. Additionally, we exploit the fact that data races 
on non-atomic variables cause undefined behavior in C [25, § 5.1.2.4/35], which 
means that any unsynchronized memory access is, strictly speaking, a bug. By 
adding a data race detection algorithm, we can thereby restrict thread schedul- 
ing decisions to synchronization primitives, such as operations on mutexes and 
condition variables, which significantly reduces the state space. 

This work has three core contributions, the combination of which enables 
the analysis of real-world multi-threaded programs (see also Sect. 6 for related 
work): 


1. A partial-order reduction algorithm capable of handling real-world POSIX 
programs that use an arbitrary amount of threads, mutexes and condition 
variables. Our algorithm continues analysis in the face of deadlocks. 

2. A cutoff algorithm that recognizes whenever two Mazurkiewicz traces reach 
the same program state, as identified by its actual memory contents. This sig- 
nificantly prunes the search space and even enables the partial-order reduction 
to deal with non-terminating executions. 

3. An implementation that finds real-world bugs. 


We also present an extended, more in-depth version of this paper [42]. 
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2 Overview 


The technique proposed in this paper can be described as a process of 5 concep- 
tual steps, each of which we describe in a section below: 


THREAD 1 T](1, a=in()) T](1, a=in()) I (2, b=c) I] (2, b=c) (2, b=c) 
1 atomic_int a = in(); | (1,a=inQ)) |8 (1,a=inQ)) [8 
2 atomic_int c = 3; 2 (1, c=3) 2 (1, c=3) 
9 (2, a>=0) 9 (2, a<0) (2, a>=0) 
THNEAD 2 2 on (1, c=3) 10 (1, c=3) 12 14 
1 atomic_int b = c; (2, b=c)}|3 (2, b=c)| 3 
2 if(a >= 0) J 
uts("y"); Y (2," nyn 
4 ates 7 (2, a>=0)[4 (2, a<0)[6 17 ) 
puts("n"); | J 
(a) (2, "y")[5 (b) (2, "n")[7 (£ 


(1) 


Fig. 1. A program (a) with its 5 partial-order runs (b-f), its unfolding (g) and the 5 
steps used by our algorithm to visit the unfolding (h-1). 


2.1 Sequential Executions 


Consider the program shown in Fig. la. Assume that all variables are initially 
set to zero. The statement a = in() initializes variable a non-deterministically. 
A run of the program is a sequence of actions, i.e., pairs (,s) where i € N 
identifies a thread that executes a statement s. For instance, the sequence 


cı := (1,a=inQ), (1, c=3), (2, b=c), (2, a<0), (2, puts ("n")) 


is a run of Fig. la. This run represents all program paths where both statements 
of thread 1 run before the statements of thread 2, and where the statement a = 
in() initializes variable a to a negative number. In our notion of run, concurrency 
is represented explicitly (via thread identifiers) and data non-determinism is 
represented symbolically (via constraints on program variables). To keep things 
simple the example only has atomic integers (implicitly guarded by locks) instead 
of POSIX synchronization primitives. 


2.2 Independence Between Actions and Partial-Order Runs 


Many POR techniques use a notion called independence [20] to avoid exploring 
concurrent interleavings that lead to the same state. An independence relation 
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associates pairs of actions that commute (running them in either order results in 
the same state). For illustration purposes, in Fig. 1 let us consider two actions as 
dependent iff either both of them belong to the same thread or one of them writes 
into a variable which is read/written by the other. Furthermore, two actions will 
be independent iff they are not dependent. 

A sequential run of the program can be viewed as a partial order when we 
take into account the independence of actions. These partial orders are known as 
dependency graphs in Mazurkiewicz trace theory [31] and as partial-order runs 
in this paper. Figures 1b to 1f show all the partial-order runs of Fig. 1a. The 
partial-order run associated to the run gı above is Fig. 1c. For 


02 `= (2, b=c}, (2, a>=0), (1, a=in()), (2, puts ("y" )s (1, c=3), 


we get the partial order shown in Fig. 1f. 


2.3 Unfolding: Merging the Partial Orders 


An unfolding [16,32,37] is a tree-like structure that uses partial orders to rep- 
resent concurrent executions and conflict relations to represent thread interfer- 
ence and data non-determinism. We can define unfolding semantics for programs 
in two conceptual steps: (1) identify isomorphic events that occur in different 
partial-order runs; (2) bind the partial orders together using a conflict relation. 

Two events are isomorphic when they are structurally equivalent, i.e., they 
have the same label (run the same action) and their causal (i.e., happens-before) 
predecessors are (transitively) isomorphic. The number within every event in 
Figs. 1b to 1f identifies isomorphic events. 

Isomorphic events from different partial orders can be merged together using 
a conflict relation for the un-merged parts of those partial orders. To understand 
why conflict is necessary, consider the set of events C := {1,2}. It obviously 
represents part of a partial-order run (Fig. 1c, for instance). Similarly, events 
C’ := {1,8,9} represent (part of) a run. However, their union C U C” does not 
represent any run, because (1) it does not describe what happens-before relation 
exists between the dependent actions of events 2 and 8, and (2) it executes 
the statement c=3 twice. Unfoldings fix this problem by introducing a conflict 
relation between events. Conflicts are to unfoldings what branches are to trees. 
If we declare that events 2 and 8 are in conflict, then any conflict-free (and 
causally-closed) subset of C U C” is exactly one of the original partial orders. 
This lets us merge the common parts of multiple partial orders without losing 
track of the original partial orders. 

Figure 1g represents the unfolding of the program (after merging all 5 partial- 
order runs). Conflicts between events are represented by dashed red lines. Each 
original partial order can be retrieved by taking a (C-maximal) set of events 
which is conflict-free (no two events in conflict are in the set) and causally closed 
(if you take some event, then also take all its causal predecessors). 

For instance, the partial order in Fig. 1d can be retrieved by resolving the 
conflicts between events 1 vs. 14, 2 vs. 8, 10 vs. 12 in favor of, resp., 1, 8, 10. 
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Resolving in favor of 1 means that events 14 to 17 cannot be selected, because 
they causally succeed 14. Similarly, resolving in favor of 8 and 10 means that 
only events 9 and 11 remain eligible, which hold no conflicts among them—all 
other events are causal successors of either 2 or 12. 


2.4 Exploring the Unfolding 


Since the unfolding represents all runs of the program via a set of compactly- 
merged, prefix-sharing partial orders, enumerating all the behaviors of the pro- 
gram reduces to exploring all partial-order runs represented in its unfolding. Our 
algorithm iteratively enumerates all C-maximal partial-order runs. 

In simplified terms, it proceeds as follows. Initially we explore the black 
events shown in Fig. 1h, therefore exploring the run shown in Fig. 1b. We discover 
the next partial order by computing the so-called conflicting extensions of the 
current partial order. These are, intuitively, events in conflict with some event 
in our current partial order but such that all its causal predecessors are in our 
current partial order. In Fig. 1h these are shown in circles, events 8 and 6. 

We now find the next partial order by (1) selecting a conflicting extension, 
say event 6, (2) removing all events in conflict with the selected extension and 
their causal successors, in this case events 4 and 5, and (3) expanding the partial 
order until it becomes maximal, thus exploring the partial order Fig. 1c, shown 
as the black events of Fig. 1i. Next we select event 8 (removing 2 and its causal 
successors) and explore the partial order Fig.1d, shown as the black events 
of Fig. 1j. Note that this reveals two new conflicting extensions that were hidden 
until now, events 12 and 14 (hidden because 8 is a causal predecessor of them, 
but was not in our partial order). Selecting either of the two extensions makes 
the algorithm explore the last two partial orders. 


2.5 Cutoff Events: Pruning the Unfolding 


When the program has non-terminating runs, its unfolding will contain infi- 
nite partial orders and the algorithm above will not finish. To analyze non- 
terminating programs we use cutoff events [32]. In short, certain events do not 
need to be explored because they reach the same state as another event that 
has been already explored using a shorter (partial-order) run. Our algorithm 
prunes the unfolding at these cutoff events, thus handling terminating and non- 
terminating programs that repeatedly reach the same state. 


3 Main Algorithm 


This section formally describes the approach presented in this paper. 
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3.1 Programs, Actions, and Runs 


Let P := (T, L,C) represent a (possibly non-terminating) multi-threaded POSIX 
C program, where T is the set of statements, £ is the set of POSIX mutexes 
used in the program, and C is the set of condition variables. This is a deliber- 
ately simplified presentation of our program syntax, see [42] for full details. We 
represent the behavior of each statement in P by an action, i.e., a pair (i, b) in 
ACN x B, where i > 1 identifies the thread executing the statement and b is 
the effect of the statement. We consider the following effects: 


B := ({loc} x T) U ({acq, rel} x £) U ({sig} x C x N) 
U ({bro} x C x 28) U ({w1, wo} x C x £) 


Below we informally explain the intent of an effect and how actions of different 
effects interleave with each other. In [42] we use actions and effects to define 
labeled transition system semantics to P. Below we also (informally) define an 
independence relation (see Sect. 2.2) between actions. 


Local Actions. An action (i, (loc,t)) represents the execution of a local state- 
ment t from thread i, i.e., a statement which manipulates local variables. For 
instance, the actions labeling events 1 and 3 in Fig. 2b are local actions. Note 
that local actions do not interfere with actions of other threads. Consequently, 
they are only dependent on actions of the same thread. 


Mutex Lock/Unlock. Actions (i,(acq,l)) and (i, (rel,l)) respectively represent 
that thread i locks or unlocks mutex l € £. The semantics of these actions cor- 
respond to the so-called NORMAL mutexes in the POSIX standard [4]. Actions 
of (acq, l) or (rel, l) effect are only dependent on actions whose effect is an opera- 
tion on the same mutex l (acq, rel, wi or w2, see below). For instance the action 
of event 4 (rel) in Fig. 2b depends on the action of event 6 (acq). 


Wait on Condition Variables. The occurrence of a pthread_cond_wait(c, 1) 
statement is represented by two separate actions of effect (w1,c,/) and (w2,c, l). 
An action (i, (w1,c,!)) represents that thread i has atomically released the lock | 
and started waiting on condition variable c. An action (i, (w2,c,l)) indicates 
that thread i has been woken up by a signal or broadcast operation on c and 
that it successfully re-acquired mutex l. For instance the action (1, (w1, c,m)) of 
event 10 in Fig. 2c represents that thread 1 has released mutex m and is waiting 
for c to be signaled. After the signal happens (event 12) the action (1, (w2, c,m)) 
of event 14 represents that thread 1 wakes up and re-acquires mutex m. An 
action (i, (w1, c, l)) is dependent on any action whose effect operates on mutex | 
(acq, rel, w1 or w2) as well as signals directed to thread i ((sig, c, i), see below), 
lost signals ((sig, c, 0), see below), and any broadcast ((bro,c, W} for any W CN, 
see below). Similarly, an action (i, (w2,c,l)) is dependent on any action whose 
effect operates on lock l as well as signals and broadcasts directed to thread i 
(that is, either (sig, c, i) or (bro,c, W} when i € W). 
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Signal/Broadcast on Condition Variables. An action (i, (sig,c,j)), with j > 0 
indicates that thread 7 executed a pthread_cond_signal(c) statement. If j = 0 
then no thread was waiting on condition variable c, and the signal had no effect, 
as per the POSIX semantics. We refer to these as lost signals. Example: events 7 
and 17 in Fig. 2b and 2d are labeled by lost signals. In both cases thread 1 was 
not waiting on the condition variable when the signal happened. However, when 
j > 1 the action represents that thread j wakes up by this signal. Whenever 
a signal wakes up a thread j > 1, we can always find a (unique) w; action of 
thread j that happened before the signal and a unique w2 action in thread j 
that happens after the signal. For instance, event 12 in Fig. 2c signals thread 1, 
which went sleeping in the wı event 10 and wakes up in the w2 event 14. Simi- 
larly, an action (i, (bro,c,W)), with W C N indicates that thread i executed a 
pthread_cond_broadcast(c) statement and any thread j such that 7 € W was 
woken up. If W = 0), then no thread was waiting on condition variable c (lost 
broadcast). Lost signals and broadcasts on c depend on any action of (w1,c,-) 
effect as well as any non-lost signal/broadcast on c. Non-lost signals and broad- 
casts on c that wake up thread j depend! on wı and wg actions of thread j as 
well as any signal/broadcast (lost or not) on the same condition variable. 

A run of P is a sequence of actions in A* which respects the constraints 
stated above for actions. For instance, a run for the program shown in Fig. 2a is 
the sequence of actions which labels any topological order of the events shown 
in any partial order in Fig. 2b to 2e. The sequence below, 


(1, (loc, x=inQ)), (2, (loc, y=1)), (1, (acq, m)), 
(1, (loc, x>=0)), (1, (rel, m)), (2, (acq, m)) 


is a run of Fig. 2a. Naturally, if o € A* is a run, any prefix of ø is also a run. 
Runs explicitly represent concurrency, using thread identifiers, and symbolically 
represent data non-determinism, using constraints, as illustrated by the 1st and 
4th actions of the run above. We let runs(P) denote the set of all runs of P. 

A concrete state of P is a tuple that represents, intuitively, the program 
counters of each thread, the values of all memory locations, the mutexes locked 
by each thread, and, for each condition variable, the set of threads waiting for 
it (see [42] for a formal definition). Since runs represent operations on symbolic 
data, they reach a symbolic state, which conceptually corresponds to a set of 
concrete states of P. 

The state of a run o, written state(c), is the set of all concrete states of P 
that are reachable when the program executes the run ø. For instance, the run g’ 
given above reaches a state consisting on all program states where y is 1, x is 
a non-negative number, thread 2 owns mutex m and its instruction pointer is at 
line 3, and thread 1 has finished. We let reach(P) :=U state(o) denote 
the set of all reachable states of P. 


o€runs(P) 


' The formal definition is slightly more complex, see [42] for the details. 
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3.2 Independence 


In the previous section, given an action a € A we informally defined the set of 
actions which are dependent on a, therefore indirectly defining an independence 
relation. We now show that this relation is a valid independence [19,41]. Intu- 
itively, an independence relation is valid when every pair of actions it declares as 
independent can be executed in any order while still producing the same state. 

Our independence relation is valid only for data-race-free programs. We say 
that P is data-race-free iff any two local actions a := (i, (loc,t)) and a’ := 
(i’, (loc, t’)) from different threads (i # i’) commute at every reachable state 
of P. See [42] for additional details. This ensures that local statements of different 
threads of P modify the memory without interfering each other. 


THREAD 1 1 (loc, y=1) 1 (loc, y=1) 1 (loc, y=1) 1 (loc, y=1) 
1x = inl); (loc, x=in()) 5 (loc, (loc, x=in()) 5 (loc, x=in()) 
2 pthread_mutex_lock(m); i 
3 if(x < 0) 2 |(aca, m) 
1 pthread_cond_wait(c, m); v 
5 pthread_mutex_unlock(m); i (loc, x>=0)} 
4 |(rel, m) 
THREAD 2 
Ly = 1; (acq, m)| 6 19| (acq, m) 19| (acq, m) 
2 pthread_mutex_lock(m); v Y 
3 pthread_cond_signal(c, m); (sig,c,0)| 7 20] (loc, x>=0) 22] (loc, x<0) 
1 pthread_mutex_unlock(m); * Y 
(rel, m)] 8 21] (rel, m) 23| (w1, c, m) 
(deadlock!) 
(a) (b) (c) (a) (e) 


Fig. 2. A program and its four partial-order runs. 


Theorem 1. If P is data-race-free, then the independence relation defined in 
Sect. 3.1 is valid. 


Proof. See [42]. 


Our technique does not use data races as a source of thread interference 
for partial-order reduction. It will not explore two execution orders for the two 
statements that exhibit a data race. However, it can be used to detect and report 
data races found during the POR exploration, as we will see in Sect. 4.4. 


3.3 Partial-Order Runs 


A labeled partial-order (LPO) is a tuple (X,<,h) where X is a set of events, 
< C XxX isa causality (a.k.a., happens-before) relation, and h: X — A labels 
each event by an action in A. 

A partial-order run of P is an LPO that represents a run of P without 
enforcing an order of execution on actions that are independent. All partial- 
order runs of Fig. 2a are shown in Fig. 2b to 2e. 
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Given a run o of P, we obtain the corresponding partial-order run Es := 
(E,<,h) by the following procedure: (1) initialize E, to be the only totally- 
ordered LPO that consists of |o| events where the i-th event is labeled by the 
i-th action of g; (2) for every two events e,e’ such that e < e’, remove the pair 
(e, e’) from < if h(e) is independent from h(e’); (3) restore transitivity in < (i.e., 
ife < e' and e’ < e”, then add (e, e”) to <). The resulting LPO is a partial-order 
run of P. 

Furthermore, the originating run ø is an interleaving of Es. Given some 
LPO € := (E, <, h), an interleaving of € is the sequence that labels any topo- 
logical ordering of €. Formally, it is any sequence h(e1),...,h(en) such that 
E = {e1,...,en} and e; < ej => i< j. We let inter(E) denote the set of all 
interleavings of E. Given a partial-order run € of P, the interleavings inter(€) 
have two important properties: every interleaving in inter(E€) is a run of P, and 


any two interleavings 0,0’ € inter(E) reach the same state state(a) = state(a’). 
1 5 (0,0, 0, 1) 
2\.--AAi6 
T ({1}, 0, Ø, 2) 

rX È 

4/0 ig ee ed / 

6 11) 19 ({1, 2, 3}, 0, Ø, 5) ({1, 2}, {3}, {9}, 9) ({1, 5}, {2}, {16}, 16) 

A Y ™™ ra 4 

7 12 age ({1,2,3,5},0,0,4)  ({1,2, 9}, {3},0,10)  ({1,5, 16}, {2}, 0, 17) 

8 13) Pi B3 ((1, 5, 16, 17, 18. 19}, {2}, 0, 20) 
14 en ee 
T " 


Fig. 3. (a): unfolding of the program in Fig. 2a; (b): its POR exploration tree. 


3.4 Prime Event Structures 


We use unfoldings to give semantics to multi-threaded programs. Unfoldings are 
Prime Event Structures [37], tree-like representations of system behavior that 
use partial orders to represent concurrent interaction. 

Figure 3a depicts an unfolding of the program in Fig. 2a. The nodes are events 
and solid arrows represent causal dependencies: events 1 and 4 must fire before 
8 can fire. The dotted line represents conflicts: 2 and 5 are not in conflict and 
may occur in any order, but 2 and 16 are in conflict and cannot occur in the 
same (partial-order) run. 

Formally, a Prime Event Structure [37] (PES) is a tuple € := (E, <, #, h) 
with a set of events FE, a causality relation < C E x E, which is a strict partial 
order, a conflict relation # C E x E that is symmetric and irreflexive, and a 
labeling function h: E —> A. 
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The causes of an event [e] := {e’ € E: e' < e} are the least set of events 
that must fire before e can fire. A configuration of E is a finite set C C E that 
is causally closed (fe] C C for all e € C), and conflict-free (~(e # e’) for all 
e,e’ € C). We let conf (E) denote the set of all configurations of E£. For any e € E, 
the local configuration of e is defined as [e] := [e|U{e}. In Fig. 3a, the set {1,2} is 
a configuration, and in fact it is a local configuration, i.e., [2] = {1,2}. The local 
configuration of event 6 is {1,2,3,4,5,6}. Set {2,5,16} is not a configuration, 
because it is neither causally closed (1 is missing) nor conflict-free (2 # 16). 


3.5 Unfolding Semantics for Programs 


Given a program P, in this section we define a PES Up such that every config- 
uration of Up is a partial-order run of P. 

Let E := (E1, <1,h1),.--,En := (En, <n, hn) be the collection of all the 
partial-order runs of P. The events of Up are the equivalence classes of the 
structural equality relation that we intuitively described in Sect. 2.3. 

Two events are structurally equal iff their canonical name is the same. 
Given some event e € F; in some partial-order run €;, the canonical name 
cn(e) of e is the pair (a, H) where a := hj(e) is the executed action and 
H := {cn(e’): e' <; e} is the set of canonical names of those events that causally 
precede e in €;. Intuitively, canonical names indicate that action h(e) runs 
after the (transitively canonicalized) partially-ordered history preceding e. For 
instance, in Fig. 3a for events 1 and 6 we have cn(1) = ((1, (loc, a=in())), 0), and 
cn(6) = ((2, (acq, m)), {en(1), en(2), en(3), en(4), en(5)}). Actually, the number 
within every event in Fig. 2b to 2e identifies (is in bijective correspondence with) 
its canonical name. Event 19 in Fig. 2d is the same event as event 19 in Fig. 2e 
because it fires the same action ((1, (acq,m))) after the same causal history 
({1,5, 16, 17,18}). Event 2 in Fig. 2c and 19 in Fig. 2d are not the same event 
because while h(2) = h(19) = (1, (acq,m)) they have a different causal his- 
tory ({1} vs. {1,5,16,17,18}). Obviously events 4 and 6 in Fig. 2b are different 
because h(4) 4 h(6). We can now define the unfolding of P as the only PES 
Up := (E, <, #, h) such that 


— E := {cn(e): e € E1 U... U En} is the set of canonical names of all events; 

— Relation < C Ex E is the union <1 U...U <n of all happens-before relations; 

— Any two events e,e’ € E of Up are in conflict, e # e’, when e Æ e’, and 
=(e < e'), and ~(e' < e), and h(e) is dependent on h(e’). 


Figure 3a shows the unfolding produced by merging all 4 partial-order runs 
in Fig. 2b to 2e. Note that the configurations of Up are partial-order runs of P. 
Furthermore, the C-maximal configurations are exactly the 4 originating partial 
orders. It is possible to prove that Up is a semantics of P. In [42] we show that 
(1) Up is uniquely defined, (2) any interleaving of any local configuration of Up 
is a run of P, (3) for any run o of P there is a configuration C of Up such that 
o € inter(C). 
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3.6 Conflicting Extensions 


Our technique analyzes P by iteratively constructing (all) partial-order runs 
of P. In every iteration we need to find the next partial order to explore. We 
use the so-called conflicting extensions of a configuration to detect how to start 
a new partial-order run that has not been explored before. 

Given a configuration C of Up, an extension of C is any event e € E\C such 
that all the causal predecessors of e are in C. We denote the set of extensions of C 
as ex(C) := {e€ E:e € CA [e| C C}. The enabled events of C are extensions 
that can form a larger configuration: en(C) := {e € ex(C): CU {e} € conf (E)}. 
For instance, in Fig. 3a, the (local) configuration [6] has 3 extensions, ex([6]) = 
{7,9,16} of which, however, only event 7 is enabled: en([6]) = {7}. Event 19 is 
not an extension of [6] because 18 is a causal predecessor of 19, but 18 ¢ [6]. A 
conflicting extension of C is an extension for which there is at least one e’ € C 
such that e # e’. The (local) configuration [6] from our previous example has two 
conflicting extensions, events 9 and 16. A conflicting extension is, intuitively, an 
incompatible addition to the configuration C’, an event e that cannot be executed 
together with C (without removing e’ and its causal successors from C). We 
denote by cex(C) the set of all conflicting extensions of C, which coincides with 
the set of all extensions that are not enabled: cex(C) := ex(C) \ en(C). 


Algorithm 1: Conflicting extensions for acq/we2 events. 


1 Function cex-acq-w2(e) 

2 | Assume that e is ((i, (acq,l)), K) or (li, (w2, c,l)), K) 
3 | R:=0 

1 | e := last-of (K, i) 

5 | if effect(e) = (acq, l) then 


6 | | P:= [el] 

7 | else 

8 es := last-notify(e,c,i) 
9 F := [ez] U [es] 


0 | €m := last-lock(P,1) 

1 | er := last-lock(K,1) 

2 | if em = er then return R 

3 | if em = LV effect(em) € {(rel, l}, (w1, +, l)}} then 
1 | | Add (h(e), P) to R 

5 | foreach event e' € K \ (PU {e,}) do 

6 if effect(e’) € {(rel, 1), (wi, -,1)} then 

| Add (h(e), P U [e’]) to R 


8 | return R 


Our technique discovers new conflicting extension events by trying to revert 
the causal order of certain events in C. Owing to space limitations we only 
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explain how the algorithm handles events of acq and wp effect ([42] presents the 
remaining 4 procedures of the algorithm). Algorithm 1 shows the procedure that 
handles this case. It receives an event e of acq or wg effect (line 2). We build and 
return a set of conflicting extensions, stored in variable R. Events are added to R 
in line 14 and 17. Note that we define events using their canonical name. For 
instance, in line 14 we add a new event whose action is h(e) and whose causal 
history is P. Note that we only create events that execute action h(e). Concep- 
tually speaking, the algorithm simply finds different causal histories (variables P 
and e’) within the set K = [e] to execute action h(e). 

Procedure last-of(C,7) returns the only <-maximal event of thread i in C; 
last-notify(e,c,i) returns the only immediate <-predecessor e’ of e such that 
the effect of h(e’) is either (sig,c,i) or (bro,c,S) with i € S; finally, procedure 
last-lock(C,/) returns the only <-maximal event that manipulates lock / in C 
(an event of effect acq, rel, w1 or w2), or L if no such event exists. See [42] for 
additional details. 


Algorithm 2: Main algorithm. See Sect. 3.7. 


1 Global variables: U := Ø (set of events of Up) and N := ) (set of tree nodes) 


2 Procedure explore() 18 Function nod(C, D, A) 

3 | nod(@, 0,0) 19 | if AA then 

1 | repeat 20 e := select from ena(C) N A 

5 Select n := (C, D, A, e) from N 21 | else 

6 Add cea(C) to U 22 | | e:= select from ena(C) \ D 

í if oo C D then 23 | n:= (C, D, A, e) 

š e ens 24 | Add n to N 

9 if n has no left child then 25 | return n 

0 n’ := nod (C U {e}, D, A \ {e}) T : 
1 | Make n’ the left child of n 26 Function ena(C) 

2 if n has no right child then 27 | return {e € en(C): ~=cutoff (e)} 
3 J := alt (C, DU {e}) 28 Function alt (C, D) 

4 if J # Ø then 29 | Let e be some event in DN en(C) 
5 n’ := nod (C, DU {e}, J\ C) 30 | S:= {e EU: e #enr[feln D= o} 
6 Make n’ the right child of n 31 | S:= {e' € S: [e'] UC is a config.} 
17 | until fixed point (N is stable) 32 | if S =Ø then return 9) 


; | Select some event e’ from S 
34 | return [e’] 


3.7 Exploring the Unfolding 


This section presents an algorithm that explores the state space of P by 
constructing all maximal configurations of Up. In essence, our procedure is 
an improved Quasi-Optimal POR algorithm [35], where the unfolding is not 
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explored using a DFS traversal, but a user-defined search order. This enables 
us to build upon the preexisting exploration heuristics (“searchers”) in KLEE 
rather than having to follow a strict DFS exploration of the unfolding. 

Our algorithm explores one configuration of Up at a time and organizes the 
exploration into a binary tree. Figure 3b shows the tree explored for the unfolding 
shown in Fig.3a. A tree node is a tuple n := (C, D, A,e) that represents both 
the exploration of a configuration C of Up and a choice to execute, or not, event 
e € en(C). Both D (for disabled) and A (for add) are sets of events. 

The key insight of this tree is as follows. The subtree rooted at a given node n 
explores all configurations of Up that include C and exclude D, with the following 
constraint: n’s left subtree explores all configurations including event e and n’s 
right subtree explores all configuration excluding e. Set A is used to guide the 
algorithm when exploring the right subtree. For instance, in Fig. 3b the subtree 
rooted at node n := ({1,2},0,0,3) explores all maximal configurations that 
contain events 1 and 2 (namely, those shown in Fig. 2b and 2c). The left subtree 
of n explores all configurations including {1, 2,3} (Fig. 2b) and the right subtree 
all of those including {1,2} but excluding 3 (Fig. 2c). 

Algorithm 2 shows a simplified version of our algorithm. The complete ver- 
sion, in [42], specifies additional details including how nodes are selected for 
exploration and how they are removed from the tree. The algorithm constructs 
and stores the exploration tree in the variable N, and the set of currently known 
events of Uy in variable U. At the end of the exploration, U will store all events 
of Uy and the leafs of the exploration tree in N will correspond to the maximal 
configurations of Uy. 

The tree is constructed using a fixed-point loop (line 4) that repeats the 
following steps as long as they modify the tree: select a node (C, D, A, e) in the 
tree (line 5), extend U with the conflicting extensions of C (line 6), check if the 
configuration is C-maximal (line 7), in which case there is nothing left to do, 
then try to add a left (line 9) or right (line 12) child node. 

The subtree rooted at the left child node will explore all configurations that 
include C U {e} and exclude D (line 10); the right subtree will explore those 
including C and excluding D U {e} (line 15), if any of them exists, which we 
detect by checking (line 14) if we found a so-called alternative [41]. 

An alternative is a set of events which witnesses the existence of some maxi- 
mal configuration in Up that extends C without including D U {e}. Computing 
such witness is an NP-complete problem, so we use an approximation called 
k-partial alternatives [35], which can be computed in P-time and works well 
in practice. Our procedure alt specifically computes 1-partial alternatives: it 
selects k = 1 event e from DN en(C), searches for an event e’ in conflict with e 
(we have added all known candidates in line 6, using the algorithms of Sect. 3.6) 
that can extend C (i.e., such that C U [e’] is a configuration), and returns it. 
When such an event e’ is found (line 33), some events in its local configuration 
[e'] become the A-component of the right child node (line 15), and the leftmost 
branch rooted at that node will re-execute those events (as they will be selected 
in line 20), guiding the search towards the witnessed maximal configuration. 
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For instance, in Fig.3b, assume that the algorithm has selected node n = 
({1}, 0, 0,2) at line 5 when event 16 is already in U. Then a call to alt ({1}, {2}) 
is issued at line 13, event e = 2 is selected at line 29 and event e’ = 16 gets 
selected at line 33, because 2 # 16 and [16] U {1} is a configuration. As a 
result, node n’ = ({1}, {2}, {5,16},5) becomes the right child of n in line 15, 
and the leftmost branch rooted at n’ adds {5,16} to C, leading to the maximal 
configuration Fig. 2d. 


3.8 Cutoffs and Completeness 


All interleavings of a given configuration always reach the same state, but inter- 
leavings of different configurations can also reach the same state. It is possible 
to exclude certain such redundant configurations from the exploration without 
making the algorithm incomplete, by using cutoff events [32]. 

Intuitively, an event is a cutoff if we have already visited another event that 
reaches the same state with a shorter execution. Formally, in Algorithm 2, line 
27 we let cutoff(e) return true iff there is some e’ € U such that state([e]) = 
state([e’]) and |[e’]| < |[e]]. This makes Algorithm 2 ignore cutoff events and any 
event that causally succeeds them. Sect. 4.2 explains how to effectively implement 
the check state([e]) = state({e’]). 

While cutoffs prevent the exploration of redundant configurations, the anal- 
ysis is still complete: it is possible to prove that every state reachable via a 
configuration with cutoffs is also reachable via a configuration without cutoffs. 
Furthermore, cutoff events not only reduce the exploration of redundant configu- 
rations, but also force the algorithm to terminate for non-terminating programs 
that run on bounded memory. 


Theorem 2 (Correctness). For any reachable state s € reach(P), Algo- 
rithm 2 explores a configuration C such that for some C’ C C it holds that 
state(C’) = s. Furthermore, it terminates for any program P such that reach(P) 
is finite. 


A proof sketch is available in [42]. Naturally, since Algorithm 2 explores Up, 
and Up is an exact representation of all runs of P, then Algorithm 2 is also 
sound: any event constructed by the algorithm (added to set U) is associated 
with a real run of P. 


4 Implementation 


We implemented our approach on top of the symbolic execution engine KLEE [10], 
which was previously restricted to sequential programs. KLEE already provides 
a minimal POSIX support library that we extended to translate calls to pthread 
functions to their respective actions, enabling us to test real-world multi-threaded 
C programs. We also extended already available functionality to make it thread- 
safe, e.g., by implementing a global file system lock that ensures that concurrent 
reads from the same file descriptor do not result in unsafe behavior. The source 
code of our prototype is available at https://github.com/por-se/por-se. 
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4.1 Standby States 


When a new alternative is explored, a symbolic execution state needs to be 
computed to match the new node in the POR tree. However, creating it from 
scratch requires too much time and keeping a symbolic execution state around 
for each node consumes significant amounts of memory. Instead of committing to 
either extreme, we store standby states at regular intervals along the exploration 
tree and, when necessary, replay the closest standby state. This way, significantly 
fewer states are kept in memory without letting the replaying of previously 
computed operations dominate the analysis either. 


4.2 Hash-Based Cutoff Events 


Schemmel et al. presented [43] an incremental hashing scheme to identify infi- 
nite loops during symbolic execution. The approach detects when the program 
under test can transition from any one state back to that same state. Their 
scheme computes fragments for small portions of the program state, which are 
then hashed individually, and combined into a compound hash by bitwise xor 
operations. This compound hash, called a fingerprint, uniquely (modulo hash 
collisions) identifies the whole state of the program under test. We adapt this 
scheme to provide hashes that identify the concurrent state of parallel programs. 

To this end, we associate each configuration with a fingerprint that describes 
the whole state of the program at that point. For example, if the program 
state consists of two variables, x = 3 and y = 5, the fingerprint would be 
fp = hash ("x=3") @ hash ("y=5"). When one fragment changes, e.g., from z = 3 
to x = 4, the old fragment hash needs to be replaced with the new one. This 
operation can be performed as fp’ = fp @ hash ("x=3") @ hash ("x=4") as the 
duplicate fragments for x = 3 will cancel out. To quickly compute the finger- 
print of a configuration, we annotate each event with an xor of all of these update 
operations that were done on its thread. Computing the fingerprint of a config- 
uration now only requires xor-ing the values from its thread-maximal events, 
which will ensure that all changes done to each variable are accounted for, and 
cancel out one another so that only the fragment for the last value remains. 

Any two local configurations that have the same fingerprint represent the 
same program state; each variable, program counter, etc., has the same value. 
Thus, it is not necessary to continue exploring both—we have found a potential 
cutoff point, which the POR algorithm will treat accordingly (Sect. 3.8). 


4.3 Deterministic and Repeatable Allocations 


KLEE usually uses the system allocator to determine the addresses of objects 
allocated by the program under test. But it also provides a (more) deterministic 
mode, in which addresses are consumed in sequence from a large pre-allocated 
array. Since our hash-based cutoff computation uses memory address as part of 
the computation, using execution replays from standby states (Sect. 4.1) requires 
that we have fully repeatable memory allocation. 
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We tackle this problem by decoupling the addresses returned by the emulated 
system allocator in the program under test from the system allocator of KLEE 
itself. A new allocator requires a large amount of virtual memory in which it 
will perform its allocations. This large virtual memory mapping is not actually 
used unless an external function call is performed, in which case the relevant 
objects are temporarily copied into the region from the symbolic execution state 
for which the external function call is to be performed. Afterwards, the pages 
are marked for reclamation by the OS. This way, allocations done by different 
symbolic execution states return the same address to the program under test. 

While a deterministic allocator by itself would be enough for providing deter- 
ministic allocation to sequential programs, parallel programs also require an allo- 
cation pattern that is independent of which sequentialization of the same partial 
order is chosen. We achieve this property by providing independent allocators for 
each thread (based on the thread id, thus ensuring that the same virtual mem- 
ory mapping is reused for each instance of the same semantic thread). When an 
object is deallocated on a different thread than it was allocated on, its address 
only becomes available for reuse once the allocating thread has reached a point 
in its execution where it is causally dependent on the deallocation. Additionally, 
the thread ids that are used by our implementation are hierarchically defined: A 
new thread t that is the i-th thread started by its parent thread p has the thread 
id t := (p, i), with the main thread being denoted as (1). This way, thread ids and 
the associated virtual memory mappings are independent of how the concurrent 
creation of multiple threads are sequentialized. 

We have also included various optimizations that promote controlled reuse of 
addresses to increase the chance that a cutoff event (Sect. 4.2) is found, such as 
binning allocations by size, which reduces the chance that temporary allocations 
impact which addresses are returned for other allocations. 


4.4 Data Race Detection 


Our data race detection algorithm simply follows the happens-before relation- 
ships established by the POR. However, its implementation is complicated by 
the possibility of addresses becoming symbolic. Generally speaking, a symbolic 
address can potentially point to any and every byte in the whole address space, 
thus requiring frequent and large SMT queries to be solved. 

To alleviate the quadratic blowup of possibly aliasing accesses, we exploit 
how KLEE performs memory accesses with symbolic addresses: The symbolic 
state is forked for every possible memory object that the access may refer to (and 
one additional time if the memory access may point to unallocated memory). 
Therefore, a symbolic memory access is already resolved to memory object gran- 
ularity when it potentially participates in a data race. This drastically reduces 
the amount of possible data races without querying the SMT solver. 
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4.5 External Function Calls 


When a program wants to call a function that is neither provided by the program 
itself nor by the runtime, KLEE will attempt to perform an external function call 
by moving the function arguments from the symbolic state to its own address 
space and attempting to call the function itself. While this support for uninter- 
preted functions is helpful for getting some results for programs which are not 
fully supported by KLEE’s POSIX runtime, it is also inherently incomplete and 
not sound in the general case. Our prototype includes this option as well. 


5 Experimental Evaluation 


To explore the efficacy of the presented approach, we performed a series of exper- 
iments including both synthetic benchmarks from the SV-COMP [9] benchmark 
suite and real-world programs, namely, Memcached [3] and GNU sort [1]. We 
compare against Yogar-CBMC [49], which is the winner of the concurrency safety 
category of SV-COMP 2019 [9], and stands in for the family of bounded model 
checkers. As such, Yogar-CBMC is predestined to fare well in the artificial SV- 
COMP benchmarks, while our approach may demonstrate its strength in dealing 
with more complicated programs. 


Table 1. Our prototype and Yogar-CBMC running SV-COMP benchmarks. Timeout 
set at 15min with maximum memory usage of 15GB. Columns are: T: true result, 
output matches expected verdict; F: false result, output does not match expected ver- 
dict; U: unknown result, tool yields no answer; Time: total time taken; RSS: maximum 
resident set size over all benchmarks. 


Benchmark Our tool Yogar-CBMC 
T F U Time RSS T F U Time RSS 
pthread 29 — 9 1:50:19 16GB 29 — 9 0:31:21 948 MB 
pthread-driver-races 16 1 4 1:03:08 6049MB 21 — — 0:00:12 72MB 


We ran the experiments on a cluster of multiple identical machines with 
dual Intel Xeon E5-2643 v4 CPUs and 256 GiB of RAM. We used a 4h timeout 
and 200 GB maximum memory usage for real-world programs. We used a 15 min 
timeout and 15 GB maximum memory for individual SV-COMP benchmarks. 


5.1 SV-COMP 


We ran our tool and Yogar-CBMC on the “pthread” and “pthread-driver-races” 
benchmark suites in their newest (2020) incarnation. As expected, Table 1 shows 
that Yogar-CBMC clearly outperforms our tool for this specific set of bench- 
marks. Not only does Yogar-CBMC not miscategorize even a single benchmark, 
it does so quickly and without using a lot of memory. Our tool, in contrast, takes 
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significantly more time and memory to analyze the target benchmarks. In fact, 
several benchmarks do not complete within the 15 min time frame and therefore 
cannot give a verdict for those. 

The “pthread-driver-races” benchmark suite contains one benchmark that is 
marked as a failure for our tool in Table 1. For the relevant benchmark, a verdict 
of “target function unreachable” is expected, which we translate to mean “no 
data race occurs”. However, the benchmark program constructs a pointer that 
may point to effectively any byte in memory, which, upon dereferencing it, leads 
to both, memory errors and data races (by virtue of the pointer also being able 
to touch another thread’s stack). While we report this behavior for completeness 
sake, we attribute it to the adaptations made to fit the SV-COMP model to ours. 


Preparation of Benchmark Suites. The SV-COMP benchmark suite does 
not only assume various kinds of special casing (e.g., functions whose name 
begins with _VERIFIER_atomic must be executed atomically), but also routinely 
violates the C standard by, for example, employing data races as a control flow 
mechanism [25, § 5.1.2.4/35]. Partially, this is because the analysis target is a 
question of reachability of a certain part of the benchmark program, not its 
correctness. We therefore attempted to guess the intention of the individual 
benchmarks, making variables atomic or leaving the data race in when it is the 
aim of the benchmark. 


5.2 Memcached 


Memcached [3] is an in-memory network object cache written in C. As it is a 
somewhat large project with a fairly significant state space, we were unable to 
analyze it completely, even though our prototype still found several bugs. Our 
attempts to run Yogar-CBMC did not succeed, as it reproducibly crashes. 


Faults Detected. Our prototype found nine bugs in memcached 1.5.19, 
attributable to four different root causes, all of which where previously unknown. 
The first bug is a misuse of the pthread API, causing six mutexes and condition 
variables to be initialized twice, leading to undefined behavior. We reported? 
the issue, a fix is included in version 1.5.20. The second bug occurs during the 
initialization of memcached, where fields that will later be accessed in a thread- 
safe manner are sometimes accessed in a non-thread-safe manner, assuming that 
competing accesses are not yet possible. We reported? a mistake our tool found in 
the initialization order that invalidates the assumption that locking is not (yet) 
necessary on one field. A fix ships with memcached 1.5.21. For the third bug, 
memcached utilizes a maintenance thread to manage and resize its core hash 
table when necessary. Additionally, on another thread, a timer checks whether 
the maintenance thread should perform an expansion of the hash table. We 


? https: //github.com/memcached /memcached /pull/566. 
3 https: //github.com/memcached/memcached/pull/575. 
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found* a data race between these two threads on a field that stores whether 
the maintenance thread has started expanding. This is fixed in version 1.5.20. 
The fourth and final issue is a data race on the stats_state storing execution 
statistics. We reported® this issue and a fix is included in version 1.5.21. 


Experiment. We run our prototype on five different versions of memcached, 
the three releases 1.5.19, 1.5.20 and 1.5.21 plus variants of the earlier releases 
(1.5.19+ and 1.5.20+) which include patches for the two bugs we found during 
program initialization. Those variants are included to show performance when 
not restricted by inescapable errors very early in the program execution. 

Table 2 shows clearly how the two initialization bugs may lead to very quick 
analyses—versions 1.5.19 and 1.5.20 are completely analyzed in 7s each, while 
versions 1.5.19+, 1.5.20+ and 1.5.21 exhaust the memory budget of 200 GB. 
We have configured the experiment to stop the analysis once the memory limit 
is reached, although the analysis could continue in an incomplete manner by 
removing parts of the exploration frontier to free up memory. Even though the 
number of error paths in Table 2 differs between configurations, it is notable 
that each configuration can only reach exactly one of the bugs, as execution is 
arrested at that point. When not restricted to the program initialization, the 
analysis of memcached produces hundreds of thousands of events and retires 
hundreds of millions of instructions in less than 2h. 


Table 2. Our prototype analyzing various versions of memcached and GNU sort. Time- 
out set at 4h with maximum memory usage of 200 GB. Columns are: RSS: maximum 
resident set size (swap space is not available); #1: number of instructions executed; 
Th: maximum number of threads active at the same time; X: total number of events 
in the explored unfolding; Mut: number of mutex lock/unlock events; CV: number of 
wait1/wait2/signal/broadcast events; A: number of symbolic choices; Cut: number of 
events determined to be cutoffs; and the number of Finished Paths distinguish between 
normal termination of the program under test (Exit), detection of an error (Err) and 
being cut off (Cut). 


Program Performance Th Events Finished Paths Halt 
Version LoC Time RSS #1 X Mut CV A Cut Exit Err Cut Reason 
Memceached 1.5.19 31065 0:00:07 204 MB 23K 1 12 6 0 3 (0) 0 1 0 Finished 
1.5.19+ 31051 1:33:42 208 GB 1.2B 6331K 271K 60K 3 24K 0 41K 29K Memory 
3 
5 
3 


1.5.20 31093 0:00:07 197 MB 92K 2 24 16 0 0) 0 a 0 Finished 
1.5.20+ 31093 1:51:10 207 GB 228M 10 745K 742K 2.7K 882 0 1 2.6K Memory 
1.5.21 31090 1:29:57 207 GB 546M 10 1.1M 1.1M 3.1K 558 0 0 2.6K Memory 
Sort 8.31 86596 0:24:29 23 GB 266M _ 2 1.8M 1.4M 269K 25K 58K 8.0K 4.9K 55K Finished 
8.31+ 86599 4:01:39 88 GB 1.0B 2 6.9M 5.8M 777K 276K 346K 6.3K 0 285K Time 


Our setup delivers a single symbolic packet to memcached followed by a con- 
crete shutdown packet. As this packet can obviously only be processed once the 


* https://github.com/memcached/memcached /pull/569. 
5 https://github.com/memcached/memcached /pull/573. 
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server is ready to process input, we observe symbolic choices only after program 
startup is complete. (Since our prototype builds on KLEE, note that it assumes 
a single symbolic choice during startup, without generating an additional path.) 


5.3 GNU sort 


GNU sort uses threads for speeding up the sorting of very large workloads. We 
reduced the minimum size of input required to trigger concurrent sorting to 
four lines to enable the analysis tools to actually trigger concurrent behavior. 
Nevertheless, we were unable to avoid crashing Yogar-CBMC on this input. 

During analysis of GNU sort 8.31, our prototype detected a data race, that 
we manually verified, but were unable to trigger in a harmful manner. Table 2 
shows two variants of GNU sort, the baseline version with eager parallelization 
(8.31) and a version with added locking to prevent the data race (8.31+). 

Surprisingly, version 8.31 finishes the exploration, as all paths either exit, 
encounter the data race and are terminated or are cut off. By fixing the data 
race in version 8.31+, we make it possible for the exploration to continue beyond 
this point, which results in a full 4h run that retires a full billion instructions 
while encountering almost seven million unique events. 


6 Related Work 


The body of work in systematic concurrency testing [5,6,19,21,23,35,41,47, 50] 
is large. These approaches explore thread interleavings under a fixed program 
input. They prune the search space using context-bounding [34], increasingly 
sophisticated PORs [5-7,12,19,23,35,41], or random testing [13,50]. Our main 
difference with these techniques is that we handle input data. 

Thread-modular abstract interpretation [18,30,33] and unfolding-based 
abstract interpretation [46] aim at proving safety rather than finding bugs. 
They use over-approximations to explore all behaviors, while we focus on testing 
and never produce false alarms. Sequentialization techniques [26,36,40] encode a 
multi-threaded program into a sequential one. While these encodings can be very 
effective for small programs [26] they grow quickly with large context bounds (5 
or more, see [36]). However, some of the bugs found by our technique (Sect. 5) 
require many context switches to be reached. 

Bounded-model checking [8,15,28,39,49] for multi-threaded programs encode 
multiple program paths into a single logic formula, while our technique encodes 
a single path. Their main disadvantage is that for very large programs, even 
constructing the multi-path formula can be extremely challenging, often pro- 
ducing an upfront failure and no result. Conversely, while our approach faces 
path explosion, it is always able to test some program paths. 

Techniques like [17,27,44] operate on a data structure conceptually very sim- 
ilar to our unfolding. They track read/write operations to every variable, which 
becomes a liability on very large executions. In contrast, we only use POSIX 
synchronization primitives and compactly represent memory accesses to detect 
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data races. Furthermore, they do not exploit anything similar to cutoff events 
for additional trace pruning. 

Interpolation [14,48] and weakest preconditions [24] have been combined with 
POR and symbolic execution for property-guided analysis. These approaches are 
mostly complementary to PORs like our technique, as they eliminate a different 
class of redundant executions [24]. 

This work builds on top of previous work [35,41,46]. The main contributions 
w.r.t. those are: (1) we use symbolic execution instead of concurrency testing [35, 
41] or abstract interpretation [46]; (2) we support condition variables, providing 
algorithms to compute conflicting extensions for them; and (3) here we use hash- 
based fingerprints to compute cutoff events, thus handling much more complex 
partial orders than the approach described in [46]. 


7 Conclusion 


Our approach combines POR and symbolic execution to analyze programs w.r.t. 
both input (data) and concurrency non-determinism. We model a significant por- 
tion of the pthread API, including try-lock operations and robust mutexes. We 
introduce two techniques to cope with state-space explosion in real-world pro- 
grams. We compute cutoff events by using efficiently-computed fingerprints that 
uniquely identify the total state of the program. We restrict scheduling to syn- 
chronization points and report data races as errors. Our experiments found pre- 
viously unknown bugs in real-world software projects (memcached, GNU sort). 
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Abstract. While hardware generators have drastically improved design 
productivity, they have introduced new challenges for the task of veri- 
fication. To effectively cover the functionality of a sophisticated gener- 
ator, verification engineers require tools that provide the flexibility of 
metaprogramming. However, flexibility alone is not enough; components 
must also be portable in order to encourage the proliferation of verifica- 
tion libraries as well as enable new methodologies. This paper introduces 
fault, a Python embedded hardware verification language that aims to 
empower design teams to realize the full potential of generators. 


1 Introduction 


The new golden age of computer architecture relies on advances in the design 
and implementation of computer-aided design (CAD) tools that enhance produc- 
tivity [11,21]. While hardware generators have become much more powerful in 
recent years, the capabilities of verification tools have not improved at the same 
pace [12]. This paper introduces fault,’ a domain-specific language (DSL) that 
aims to enable the construction of flexible and portable verification components, 
thus helping to realize the full potential of hardware generators. 

Using flexible hardware generators [1,16] drastically improves the produc- 
tivity of the hardware design process, but simultaneously increases verification 
cost. A generator is a program that consumes a set of parameters and produces a 
hardware module. The scope of the verification task grows with the capabilities 
of the generator, since more sophisticated generators can produce hardware with 
varying interfaces and behavior. To reduce the cost of attaining functional cov- 
erage of a generator, verification components must be as flexible as their design 


1 https: //github.com/leonardt /fault. 
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counterparts. To achieve flexibility, hardware verification languages must provide 
the metaprogramming facilities found in hardware construction languages [1]. 

However, flexibility alone is not enough to match the power of generators; 
verification tools must also enable the construction of portable components. Gen- 
erators facilitate the development of hardware libraries and promote the inte- 
gration of components from external sources. Underlying the utility of these 
libraries is the ability for components to be reused in a diverse set of envi- 
ronments. The dominance of commercial hardware verification tools with strict 
licensing requirements presents a challenge in the development of portable verifi- 
cation components. To encourage the proliferation of verification libraries, hard- 
ware verification languages must design for portability across verification tools. 
Design for portability will also promote innovation in tools by simplifying the 
adoption of new technologies, as well as enable new verification methodologies 
based on unified interfaces to multiple technologies. 

This paper presents fault, a domain-specific language (DSL) embedded in 
Python designed to enable the flexible construction of portable verification com- 
ponents. As an embedded DSL, fault users can employ all of Python’s rich 
metaprogramming capabilities in the description of verification components. 
Integration with magma [15], a hardware construction language embedded in 
Python, is an essential feature of fault that enables full introspection of the 
hardware circuit under test. By using a staged metaprogramming architecture, 
fault verification components are portable across a wide variety of open-source 
and commercial verification tools. A key benefit of this architecture is the abil- 
ity to provide a unified interface to constrained random and formal verification, 
enabling engineers to reuse the same component in simulation and model check- 
ing environments. fault is actively used by academic and industrial teams to ver- 
ify digital, mixed-signal, and analog designs for use in research and production 
chips. This paper demonstrates fault’s capabilities by evaluating the runtime 
performance of different tools on a variety of applications ranging in complexity 
from unit tests of a single module to integration tests of a complex design. These 
experiments leverage fault’s portability by reusing the same source input across 
separate trials for each target tool. 


2 Design 


We had three goals in designing fault: enable the construction of flexible 
test components through metaprogramming, provide portable abstractions that 
allow test component reuse across multiple target environments, and support 
direct integration with standard programming language features. The ability 
to metaprogram test components is a vital requirement for scaling verification 
efforts to cover the space of functionality utilized by hardware generators. Porta- 
bility widens the target audience of a reusable component and enhances a design 
team’s productivity by enabling simple migration to different technologies. Inte- 
gration with a programming language enables design teams to leverage standard 
software patterns for reuse as well as feature-rich test automation frameworks. 
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Fig. 1. Architectural overview of the fault testing system. In a Python program, the 
user constructs a Tester object with a magma Circuit and records a sequence 
of test Actions. The compiler uses the action sequence as an intermediate represen- 
tation (IR). Backend targets lower the actions IR into a format compatible with the 
corresponding tool and provide an API to run the test and report the results. 


Backend 
Targets 


Figure 1 provides an overview of the system architecture. fault is a DSL 
embedded in Python, a prolific dynamic language with rich support for metapro- 
gramming and a large ecosystem of libraries. fault is designed to work with 
magma [15], a Python embedded hardware construction language which rep- 
resents circuits as introspectable Python objects containing ports, connections, 
and instances of other circuits. While fault and magma separate the concerns of 
design and verification into separate DSLs, they are embedded in the same host 
language for simple interoperability. This multi-language design avoids the com- 
plexity of specifying and implementing a single general purpose language without 
sacrificing the benefits of tightly integrating design and verification code. 

To construct fault test components, the user first instantiates a Tester 
object with a magma circuit as an argument. The user then records a sequence 
of test actions using an API provided by the Tester class. Here is an example 
of constructing a test for a 16-bit Add circuit: 


tester = Tester (Add16) 
tester.poke(Add16.in0, 3) 
tester.poke(Add16.in1, 2) 
tester.eval () 

tester.expect (Add16.out, 5) 


The poke action (method) sets an input value, the eval action triggers evalua- 
tion of the circuit (the effects of poke actions are not propagated until an eval 
action occurs), and the expect action asserts the value of an output. Attributes 
of the Add16 object refer to circuit ports by name. 

fault’s design is based on the concept of staged metaprogramming [20]; the 
user writes a program that constructs another program to be executed in a 
subsequent stage. In fault, the first stage executes Python code to construct a 
test specification; the second stage invokes a target runtime that executes this 
specification. To run the test for the 16-bit Add, the user simply calls a method 
and provides the desired target: 


tester.compile_and_run("verilator") 
tester.compile_and_run("system-verilog", simulator="iverilog") 
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By applying staged metaprogramming, fault allows the user to leverage the 
full capabilities of the Python host language in the programmatic construction 
of test components. For example, a test can use a native for loop to construct a 
sequence of actions using the built-in random number library and integer type: 


for _ in range(32): 
N = (1 << 16) - 1 
ind, inl = random.randint(0, N), random.randint(0, N) 
tester.poke(Add16.in0, in0) 
tester.poke(Add16.inl1, inl) 
tester.eval () 
tester.expect (Add16.out, (ind + inl) & N) 


Python for loops are executed during the first stage of computation and are 
effectively “unrolled” into a flat sequence of actions. Other control structures 
such as while loops, if statements, and function calls are handled similarly. 

Python’s object introspection capabilities greatly enhance the flexibility of 
fault tests. For example, the core logic of the above test can be generalized to 
support an arbitrary width Add circuit by inspecting the interface: 


# compute max value based on port width (length) 

N = (1 << len(Add.in0O)) - 1 

ind, inl = random.randint(0, N), random.randint(0, N) 
tester.poke(Add.in0, in0) 

tester.poke(Add.inl, inl) 

tester.eval () 

tester.expect (Add.out, (in0 + inl) & N) 


This ability to metaprogram components as a function of the design under test 
is an essential aspect of fault’s design. It allows the construction of generic com- 
ponents that can be reused across designs with varying interfaces and behavior. 

fault’s embedding in Python’s class system provides an opportunity for reuse 
through inheritance. For example, a design team could subclass the generic 
Tester class and add a new method to perform an asynchronous reset sequence: 


class ResetTester (Tester): 
def _ init__(self, circuit, clock, reset_port): 
super().__ init__(self, circuit, clock) 
self.reset_port = reset_port 


def reset (self): 

# asynchronous reset, negative edge 

self .poke(self.reset_port, 1) 

self.eval () 

self.poke(self.reset_port, 0) 

self.eval ( 

self .poke ( 
( 


) 
self.reset_port, 1) 
self.eval () 


Combining inheritance with introspection, we can augment the the 
ResetTester to automatically discover the reset port by inspecting port types: 
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class AutoResetTester (ResetTester) : 
def __init__ (self, circuit, clock): 
# iterate over interface to find reset (assumes exactly one) 
for port in circuit.interface.ports.values(): 
if isinstance(port, AsyncResetN) : 
reset_port = port 
super().__init__(self£, circuit, clock, reset_port) 


2.1 Frontend: Tester API 


fault’s Python embedding is implemented by the Tester class which provides 
various interfaces for recording test actions as well as methods for compiling and 
running tests using a specific target. By using Python’s class system to perform a 
shallow embedding [5], fault avoids the complexity of processing abstract syntax 
trees and simply uses Python’s standard execution to construct test components. 
As a result, programming in fault is much like programming with a standard 
Python library. This design choice reduces the overhead of learning the DSL 
and simplifies aspects of implementation such as error messages, but comes at 
the cost of limited capabilities for describing control flow. The fault frontend 
described in this paper focuses on implementation simplicity, but the system is 
designed to be easily extended with new frontends using alternative embeddings. 


Action Methods. The Tester class provides a low-level interface for 
recording actions using methods. The basic action methods are poke (set 
a port to a value), expect (assert a port equals a value), step (invert 
the value of the clock), peek (read the value of a port), and eval (eval- 
uate the circuit). The peek method returns an object containing a ref- 
erence to the value of a circuit port in the current simulation state. 
Using logical and arithmetic operators, the user can construct expressions 
with this object and pass the result to other actions. For example, to 
expect that the value of the port O0 is equal to the inverse of the 
value of port O1, the user would write tester.expect (circuit.00, 
~tester.peek(circuit.01)). The Tester provides a print action to 
display simulation runtime information included the peeked values. 


Metaprogramming Control Flow. Notably absent from the basic method 
interface described above are control flow abstractions. As noted before, standard 
Python control structures such as loops and if statements are executed in the 
first stage of computation as part of the metaprogram. However, there are cases 
where the user intends to preserve the control structure in the generated code, 
such as long-running loops that should not be unrolled at compile time or loops 
that are conditioned on dynamic values from the circuit state. For example, 
consider a while loop that executes until it receives a ready signal: 

# Construct while loop conditioned on circuit.ready. 

loop = tester._while(tester.peek(circuit.ready) ) 

loop.expect (circuit.ready, 0) # executes inside loop 

loop.step (2) # executes inside loop 


# Check final state after loop has exited 
tester .expect (circuit.count, expected_cycle_count) 
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This logic could not be encoded in the metaprogram, because the metapro- 
gram is evaluated before the test is run, and thus does not know anything 
about the runtime state of the circuit. To capture this dynamic control flow, 
the Tester provides methods for inserting if-else statements, for loops, 
and while loops. Each of these methods returns a new instance of the current 
Tester object which provides the same API, allowing the user to record actions 
corresponding to the body of the control construct. The Tester class provides 
convenience functions for using these control structures to generate common 
patterns, such as wait_on, wait_until_low, and wait_until_posedge. 


Attribute Interface. While the low-level method interface is useful for writ- 
ing complex metaprograms, simple components are rather verbose to construct. 
To simplify the handling of basic actions like poke and peek, the Tester 
object exposes an interface for referring to circuit ports and internal signals using 
Python’s object attribute syntax. For example, to poke the input port I of a 
circuit with value 1, one would write tester.circuit.I = 1. This interface 
supports referring to internal signals using a hierarchical syntax. For example, 
referring to port Q of an instance ff can be done with tester.circuit.ff.Q. 


Assume/Guarantee. The Tester object provides methods for specifying 
assumptions and guarantees that are abstracted over constrained random and 
formal model checking runtime environments. An assumption is a constraint 
on input values, and a guarantee is an assertion on output values. Assump- 
tions and guarantees are specified using Python lambda functions that return 
symbolic expressions referring to the input and output and ports of a circuit. 
For example, the guarantee lambda a, b, c: (c >= a) and (c >= b) 
states that the output c is always greater than or equal to the inputs a and 
b. Here is an example of verifying a simple ALU using the assume/guarantee 
interface: 

# Configuration sequence for opcode register 

tester.circuit.opcode_en = 1 

tester.circuit.opcode = 0 # opcode for add (+) 

tester.step(2) 

tester.circuit.opcode_en = 0 

tester.step (2) 

# Verify add does not overflow 

tester.circuit.a.assume(lambda a: a < BitVector[16] (32768) ) 

tester.circuit.b.assume(lambda b: b < BitVector[16] (32768) ) 

tester.circuit.c.guarantee ( 

lambda a, b, c: (c >= a) and (c >= b) 

) 

Note that this example demonstrates the use of poke and step to initialize 


circuits not only for constrained random testing, but also for formal verification. 


2.2 Actions IR 


In using the Tester API, users construct a sequence of Action objects that are 
used as an intermediate representation (IR) for the compiler. Basic port action 
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objects, such as Poke and Expect, simply store references to ports and values. 
Control flow action objects, such as While and If, contain sub-sequences of 
actions, resulting in a hierarchical data-structure similar to an abstract syntax 
tree. This view of the compiler internals reveals that the metaphor of recording 
actions is really an abstraction over the construction of program fragments. 


2.3 Backend Targets 


fault supports a variety of open-source and commercial backend targets for run- 
ning tests. A target is responsible for consuming an action sequence, compiling 
it into a format compatible with the target runtime, and providing an API for 
invoking the runtime. Targets must also report the result of the test either by 
reading the exit code of running the process or processing the test output. 


Verilog Simulation Targets. The fault compiler includes support for the 
open-source Verilog simulators verilator [17] and iverilog [22], plus three com- 
mercial simulators. To compile fault programs to a verilator test bench, the 
backend lowers the action sequence into a C++ program that interacts with the 
software simulation object produced by the verilator compiler. For iverilog and 
the commercial simulators, the backend lowers the action sequence into a Sys- 
temVerilog test bench that interacts with the test circuit through an initial 
block inside the top-level module. One useful aspect of the System Verilog back- 
end is its handling of variations in the feature support of target simulators. For 
example, the commercial simulators use different commands for enabling wave- 
form tracing and iverilog uses a non-standard API for interacting with files. 
Constrained random inputs are generated using rejection or SMT [9] sampling. 


CoSA. The CoreIR Symbolic Analyzer (CoSA) is a solver-agnostic SMT-based 
hardware model checker [13]. fault’s CoSA target relies on magma’s ability 
to compile Python circuit descriptions to CoreIR [8], a hardware intermediate 
representation. CoreIR’s formal semantics are based on finite-state machines and 
the SMT theory of fixed-size bitvectors [3]. fault action sequences are lowered 
into CoSA’s custom explicit transition system format (ETS) and combined with 
the CoreIR representation of the circuit to produce a model. CoSA allows the 
user to specify assumptions and properties, providing a straightforward lowering 
of fault assumptions and guarantees. 


SPICE. In addition to being able to test designs with Verilog simulators, fault 
supports analog and mixed-signal simulators. Compared to the traditional app- 
roach of maintaining separate implementations for digital and analog tests, this 
is a significantly easier way to write tests for mixed-signal circuits. Basic actions 
such as poke and expect are supported in the SPICE simulation mode, but 
they are implemented quite differently than they are in Verilog-based tests. 
Rather than emitting a sequential list of actions in an initial block, fault 
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compiles poke actions into piecewise-linear (PWL) waveforms. Other actions, 
such as expect, are implemented by post-processing the simulation data. 


Verilog-AMS. For designs containing a mixture of SPICE and Verilog blocks, 
fault supports testing with a Verilog-AMS simulator. This mode is more similar 
to running System Verilog-based tests than SPICE-based tests. In particular, the 
test bench is implemented using a top-level System Verilog module, meaning that 
a wide range of actions are supported including loops and conditionals. This is a 
key benefit of using a Verilog-AMS simulator as opposed to a SPICE simulator. 


3 Evaluation 


To demonstrate fault’s capabilities, we evaluate the runtime performance of four 
different testing tasks from the domain of hardware verification. Each task high- 
lights the utility of fault’s portability by reusing the same source input across 
separate trials of different targets. Due to licensing restrictions, we omit the 
name of the commercial simulators and replace them with a generic name. The 
code to reproduce these experiments is available in the artifact.? Each experi- 
ment involves at least one open-source simulator, but reproducing all the results 
requires access to commercial simulators. 


CGRA Processing Element Unit Tests. To demonstrate the capability of 
fault as a tool for writing portable tests for digital verification, Fig.2 reports 
the runtime performance of a subset of the lassen test suite. lassen [19] is 
an open-source implementation of a CGRA processing element that contains a 
large suite of unit tests using fault. Interestingly, we see comparable perfor- 
mance between verilator and commercial simulator 1, while commercial 
simulator 2 is consistently ~5x slower than the others. One important property 
of the lassen test suite is that it generates a new test bench for each operation 
and input/output pair. This stresses a simulator’s ability to efficiently handle 
incremental changes, since each invocation involves a new top-level test bench 
file, but an unchanged design under test. 


Test verilator commercial sim 1 commercial sim 2 
test_unsigned_binary 94.483 88.700 519.079 
test_smult 31.439 28.668 170.115 
test_fp_binary_op 104.117 91.878 571.759 
test_stall 10.424 9.629 56.458 


Fig. 2. Runtime (s) for unit tests of a CGRA processing element collected with a VM 
running on an Intel(R) Xeon(R) Silver 4214 CPU @ 2.20 GHz with 256 GB of RAM. 


? https: //github.com/leonardt/fault_artifact /blob/master/README.md. 


fault: Python Hardware Verification DSL 411 


SRAM Array. To demonstrate the capability of fault as a tool for writing 
portable tests for analog and mixed-signal verification, we used OpenRAM to 
generate a 16x16 SRAM and then ran a randomized readback test of the design 
with SPICE, Verilog-AMS, and SystemVerilog simulators. OpenRAM [10] is an 
open-source memory compiler that produces a SPICE netlist and Verilog model. 

The results shown in Fig. 3a reveal two interesting trends. First, as expected, 
SPICE simulations of the array were significantly slower than Verilog simulations 
(100-1000x). Since fault allows the user to prototype tests with fast Verilog 
simulations, and then seamlessly switch to SPICE for signoff verification, our tool 
may reduce the latency in developing mixed-signal tests by orders of magnitude. 
Second, even for simulations of the same type, there was significant variation 
in the runtime of different simulators. SPICE simulation time varied by about 
2x, while Verilog simulation time varied by about 10x. One of the advantages of 
using fault is that it is easy to switch between simulators to find the one that 
works best for a particular scenario. 


Target Simulator Runtime (s) Lines of Code (LoC) 
spice ngspice 117.660 fault 136 
spice comm sim 1 199.868 Handwritten 
spice comm sim 2 98.043 SPICE 223 
system-verilog iverilog 0.238 
system-verilog comm sim 1 1.081 Handwritten 
system-verilog comm sim 2 2.807 SystemVerilog 189 
verilog-ams comm sim 1 228.405 and Verilog-AMS 
(a) Runtime using a VM on an Intel(R) (b) LoC for fault and 
Xeon(R) CPU E5-2680 v4 @ 2.40GHz with language-specific imple- 
64GB of RAM. mentations of the test. 


Fig. 3. Results for OpenRAM 16x16 SRAM randomized readback test. 


We also looked at the amount of human effort required to use fault to imple- 
ment this test as compared to the traditional approach of writing separate test- 
benches for each simulation language. Since “human effort” is subjective, we 
used lines of code as a rough metric, as measured from handwritten implemen- 
tations of the same test in SystemVerilog, Verilog-AMS, and SPICE. Figure 3b 
shows the results of this experiment: the fault-based approach used 136 LoC as 
compared to 412 LoC for the traditional approach, a reduction of 3.02x. 


CGRA Integration Test Bench. To observe how fault scales to more com- 
plex testing tasks, we report numbers for an integration test of the Stanford 
Garnet CGRA [18]. This test generates an instance of the CGRA chip, runs a 
simulation that programs the chip for an image processing application, streams 
the input image data onto the chip, and streams the output image data to a 
file. The output is compared to a reference software model. Running the test 
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took 232 min with the verilator target, 185 min with commercial simulator 
1, and 221 min with commercial simulator 2. Leveraging the portability of 
fault-based tests could save up to 47min in testing time. These results were 
collected using the same machine as the SRAM experiment (see Fig. 3a). 


Unified Constrained Random and Formal. To demonstrate the utility of 
the assume/guarantee interface as a unified abstraction for constrained random 
and formal verification, we compared the runtime performance of using a con- 
strained random target versus a formal model checker to verify the simple ALU 
property shown in Sect. 2.1. The first test evaluated the runtime performance of 
verifying correctness of the property on 100 constrained random inputs versus 
using a formal model checker. The formal model checker provided a complete 
proof of correctness using interpolation-based model checking [14] in 1.613 s, 
while constrained random verified 100 samples in 2.269 s (rejection sampling) 
and 2.799 s (SMT sampling). The second test injected a bug into the ALU by 
swapping the opcodes for addition and subtraction. The model checker found a 
counterexample in 1.154 s with bounded model checking [4], while constrained 
random failed in 2.947 s (rejection sampling) and 1.230 s (SMT sampling). In 
both cases the model checker was at least as fast as the constrained random 
equivalent while providing better coverage in the case of no bug. These results 
were collected using a MacBook Pro (13-in 2017, 4 Thunderbolt, macOS 10.15.2), 
with a 3.5 GHz Dual-Core Intel i7 CPU, and 16 GB RAM. 


4 Related Work 


Prior work has leveraged using a generic API to Verilog simulators to build porta- 
bility into testing infrastructures. The ChiselTest library [2] and cocotb [7] 
provide this capability for Scala and Python respectively. Using a generic API 
offers many of the same advantages with regards to test portability, simplic- 
ity, and automation, but the lack of multi-stage execution limits the applica- 
tion to more diverse backend targets such as SPICE simulations and formal 
model checkers. However, because these libraries interact with the simulator 
directly, they do allow user code to immediately respond to the simulator state, 
enabling interactive debugging through the host language. cocotb also presents 
a coroutine abstraction that naturally models the concurrency found in hard- 
ware simulation. Future work could investigate using cocotb as a runtime target 
for fault’s frontend, enabling a similar concurrent, interactive style of testing. 
Another interesting avenue of work would be to extend fault’s backend targets 
to support lowering cocotb’s coroutine abstraction. 


5 Conclusion 


The ethos of fault is to enable the construction of flexible, portable test com- 
ponents that are simple to integrate and scale for testing complex applications. 
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The ability to metaprogram test components is essential for enabling verification 
teams to match the productivity of design teams using generators. fault’s porta- 
bility enables teams to easily transition to different tools for different use cases, 
and enables the proliferation of reusable verification libraries that are applicable 
in a diverse set of tooling environments. 

While fault has already demonstrated utility to design teams in academia 
and industry, there remains a bright future filled with opportunity to improve 
the system. Extending the assume/guarantee interface to support temporal prop- 
erties/constraints and leverage compositional reasoning [6] is essential for scal- 
ing the approach to more complex systems. Adding concurrent programming 
abstractions such as coroutines are essential for capturing the common patterns 
used in the testing of parallel hardware. Using a deep embedding architecture 
could significantly improve the performance of generating fault test benches. 


Funding. The authors would like to thank the DARPA DSSoC (FA8650-18-2- 
7861) and POSH (FA8650-18-2-7854) programs, the Stanford AHA and SystemX 
affiliates, Intel’s Agile ISTC, the Hertz Foundation Fellowship, and the Stanford 
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Abstract. Craig interpolant generation for non-linear theory and 
its combination with other theories are still in infancy, although 
interpolation-based techniques have become popular in the verification 
of programs and hybrid systems where non-linear expressions are very 
common. In this paper, we first prove that a polynomial interpolant of 
the form h(x) > 0 exists for two mutually contradictory polynomial for- 
mulas $(x,y) and ~(x,z), with the form fı > O0A---A fn > 0, where 
fi are polynomials in x,y or x,z, and the quadratic module generated 
by fi is Archimedean. Then, we show that synthesizing such interpolant 
can be reduced to solving a semi-definite programming problem (SDP). 
In addition, we propose a verification approach to assure the validity 
of the synthesized interpolant and consequently avoid the unsoundness 
caused by numerical error in SDP solving. Besides, we discuss how to 
generalize our approach to general semi-algebraic formulas. Finally, as 
an application, we demonstrate how to apply our approach to invariant 
generation in program verification. 


Keywords: Craig interpolant - Archimedean condition - Semi-definite 
programming - Program verification - Sum of squares 


1 Introduction 


Interpolation-based techniques have become popular in recent years because of 
their inherently modular and local reasoning, which can scale up existing formal 
verification techniques like theorem proving, model-checking, abstract interpre- 
tation, and so on, while the scalability is the bottleneck of these techniques. The 
study of interpolation was pioneered by Krajíček [20] and Pudlak [30] in con- 
nection with theorem proving, by McMillan in connection with model-checking 
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[25], by Graf and Saïdi [14], Henzinger et al. [16] and McMillan [26] in con- 
nection with abstraction like CEGAR, by Wang et al. [17] in connection with 
machine-learning based program verification. 

Craig interpolant generation plays a central role in interpolation-based tech- 
niques, and therefore has drawn increasing attention. In the literature, there 
are various efficient algorithms proposed for automatically synthesizing inter- 
polants for decidable fragments of first-order logic, linear arithmetic, array logic, 
equality logic with uninterpreted functions (EUF), etc., and their combinations, 
and their use in verification, e.g., [6, 16, 18,19, 26, 27,33,33,37] and the references 
therein. Additionally, how to compare the strength of different interpolants is 
investigated in [9]. However, interpolant generation for non-linear theory and its 
combination with the aforementioned theories is still in infancy, although non- 
linear polynomials inequalities are quite common in safety-critical software and 
embedded systems [38,39]. 

In [7], Dai et al. had a first try and gave an algorithm for generating 
interpolants for conjunctions of mutually contradictory nonlinear polynomial 
inequalities based on the existence of a witness guaranteed by Stengle’s Posi- 
tivstellensatz [36], which is computable using semi-definite programming (SDP). 
Their algorithm is incomplete in general but if all variables are bounded (called 
Archimedean condition), then it becomes complete. A major limitation of their 
work is that two mutually contradictory formulas ¢ and w must have the same 
set of variables. In [10], Gan et al. proposed an algorithm to generate inter- 
polants for quadratic polynomial inequalities. The basic idea is based on the 
insight that for analyzing the solution space of concave quadratic polynomial 
inequalities, it suffices to linearize them by proving a generalization of Motzkin’s 
transposition theorem for concave quadratic polynomial inequalities. Moreover, 
they also discussed how to generate interpolants for the combination of the 
theory of quadratic concave polynomial inequalities and EUF based on the hier- 
archical calculus proposed in [34] and used in [33]. Obviously, quadratic con- 
cave polynomial inequalities is a very restrictive class of polynomial formulas, 
although most of existing abstract domains fall within it as argued in [10]. Mean- 
while, in [13], Gao and Zufferey presented an approach to extract interpolants for 
non-linear formulas possibly containing transcendental functions and differential 
equations from proofs of unsatisfiability generated by d-decision procedure [12] 
based on interval constraint propagation (ICP) [1] by transforming proof traces 
from 6-complete decision procedures into interpolants that consist of Boolean 
combinations of linear constraints. Thus, their approach can only find the inter- 
polants between two formulas whenever their conjunction is not 6-satisfiable. 
Similar idea was also reported in [21]. In [5], Chen et al. proposed an app- 
roach for synthesizing non-linear interpolants based on counterexample-guided 
and machine-learning, but it relies on quantifier elimination in order to guar- 
antee the completeness and convergence, which gives rise to the low efficiency 
of their approach theoretically. In [35], Srikanth et al. presented an approach 
called CAMPY to exploit non-linear interpolant generation, which is achieved 
by abstracting non-linear formulas (possibly with non-polynomial expressions) 
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to the theory of linear arithmetic with uninterpreted functions, i.e., EUFLIA, 
to prove and/or disprove if a given program satisfies a given property, that may 
contain nonlinear expressions. 


Example 1. In order to compare the approach proposed in this paper and the 
ones aforementioned, consider 


2 


$ = —2ry? + 2? — 3az—y® — yz +z? -1>0A100—-2?-y’? >0A 

a2? + y?2? x? y? i 5 (a4 Qa? y? y*) aye y°) 4 <0; 
y% = 4(x — y) + (x +y)? + w? — 133.097 < 0A 100(x + y)? — w(x — y)* — 3000 > 0. 
It can be checked that 6A w = L. 


Obviously, synthesizing interpolants for @ and w in this example is beyond 
the ability of the above approaches reported in [7,10]. Using the method in [13] 
implemented in dReal3 it would return “SAT” with 6 = 0.001, i.e., dA w is ð- 
satisfiable, and hence it cannot synthesize any interpolant using [12]’s approach 
with any precision greater than 0.001!. While, using our method, an interpolant 
h > 0 with degree 10 can be found as shown in Fig. 1?. Additionally, using 
the symbolic procedure REDUCE, it can be proved that h > 0 is indeed an 
interpolant of ¢ and w. 


In this paper, we investigate this 
issue and consider how to synthesize an 
interpolant for two polynomial formu- 
las o(x,y) and o(x,z) with (x,y) A 


w(x, z) H L, where 

(x,y) : f(x,y) 2 0A A fm(X,y) 2 0, 
W(x, 2) : gi(x,Z) > OA-++A gn(x,z) > 0, 
x € R, y e R*, z e RÉ are 
variable vectors, r,s,t © N, and 
fi, ---, fm;91,---, 9n are polynomials. In 
addition, Mxy{fı (x,y), wey Fink y)} 


and Mx 2{91(X,Z), ---; In(X,Z)} are two 
Archimedean quadratic modules. Here 
we allow uncommon variables, that are 
not allowed in [7], and drop the con- 
straint that polynomials must be concave 


Fig. 1. Example 1. (Green region: the 
projection of ¢(x,y,z) onto x and y; 
red region: the projection of (x,y, w) 


and quadratic, which is assumed in [10]. 
The Archimedean condition amounts to 
that all the variables are bounded, which 
is reasonable in program verification, as 
only bounded numbers can be repre- 


onto x and y; gray region plus the 
green region: the synthesized inter- 
polant {(z,y) | h(x,y) > O}.) (Color 
figure online) 


sented in computer in practice. We first prove that there exists a polynomial 


1 Alternatively, if we try the formula with 
produce any output after 20 h. 


the latest version of dReal4, it does not 


? The mathematical representation of h is given in the full version [11]. 
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h(x) such that h(x) = 0 separates the state space of x defined by ¢(x,y) from 
the one defined by w(x, z) theoretically, and then propose an algorithm to com- 
pute such h(x) based on SDP. Furthermore, we propose a verification approach 
to assure the validity of the synthesized interpolant and consequently avoid the 
unsoundness caused by numerical error in SDP solving. Finally, we also discuss 
how to extend our results to general semi-algebraic constraints. 

Another contribution of this paper is that as an application, we illustrate 
how to apply our approach to invariant generation in program verification by 
revising Lin et al.’s framework proposed in [22] for invariant generation based 
on weakest precondition, strongest postcondition and interpolation by allowing 
to generate nonlinear invariants. 

The paper is organized as follows. Some preliminaries and the problem of 
interest are introduced in Sect. 2. Section 3 shows the existence of an interpolant 
for two mutually contradictory polynomial formulas only containing conjunction, 
and Sect. 4 presents SDP-based methods to compute it. In Sect. 5, we discuss how 
to avoid unsoundness caused by numerical error in SDP. Section 6 extends our 
approach to general polynomial formulas. Section 7 demonstrates how to apply 
our approach to invariant generation in program verification. We conclude this 
paper in Sect. 8. 


2 Preliminaries 


In this section, we first give a brief introduction on some notions used throughout 
this paper and then describe the problem of interest. 


2.1 Quadratic Module 


N, Q and R are the sets of integers, rational numbers and real numbers, respec- 
tively. Q[x] and R[x] denotes the polynomial ring over rational numbers and 
real numbers in r > 1 indeterminates x : (21,...,2,). We use R[x]? := {p° | 
p € R[x}} for the set of squares and X R[x]? for the set of sums of squares of 
polynomials in x. Vectors are denoted by boldface letters. L and T stand for 
false and true, respectively. 


Definition 1 (Quadratic Module [24]). A subset M of R[x] is called a 
quadratic module if it contains 1 and is closed under addition and multiplication 
with squares, i.e., 1 € M,M + M CM, and pM CM for all p € R[x]. 

Let P := {p1,..., Ps} be a finite subset of R[x], the quadratic module M,(p) 
or simply M(p) generated by P (i.e. the smallest quadratic module containing all 
pis) is Mx(D) = {Yj dupi | Oi € DO R[x]?}, where po = 1. 


Archimedean condition plays a key role in the study of polynomial optimiza- 
tion. 


Definition 2 (Archimedean). Let M be a quadratic module of R[x]. M is 
said to be Archimedean if there exists some a > 0 such that a — X` ;_; £? € M. 


i 
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2.2 Problem Description 


Craig showed that given two formulas ¢ and w in a first-order theory 7, if 
o = y, then there always exists an interpolant I over the common symbols of @ 
and w s.t. @ = I and J — w. In the verification literature, this terminology has 
been abused following [26], where a reverse interpolant (coined by Kovacs and 
Voronkov in [19]) Z over the common symbols of ¢ and w is defined by 


Definition 3 (Interpolant). Given two formulas ¢ and w in a theory T s.t. 
AW Hr L, a formula I is an interpolant of ¢ and w if (i) ọ Er I; (ti) 
IAW EL; and (iii) I only contains common symbols and free variables shared 


by ¢ and w. 


Definition 4. A basic semi-algebraic set {x € R” | Aj_, pi(x) > 0} is called 
a set of the Archimedean form if M,.{pi(x),...,ps(x)} is Archimedean, where 
pi(x) € R[x], i=1,...,5. 


The interpolant synthesis problem of interest in this paper is given in Problem 1. 
Problem 1. Let $(x,y) and y(x, z) be two polynomial formulas of the form 


(x,y) : f(x,y) ZOA-::: fay) 2 0, 
W(x, Z) : gi(x,Z) > OA---A gn(x,z) > 0, 


where, x € R”, y € R°,z € Rt are variable vectors, r,s,t € N, and f1,..., fm, 91; 
.++3 Jn are polynomials in the corresponding variables. Suppose 6A w = L, 
and {(x,y) | ¢(x,y)} and {(x,z) | v(x,z)} are semi-algebraic sets of the 
Archimedean form. Find a polynomial h(x) such that h(x) > 0 is an interpolant 
for ¢ and w. 


3 Existence of Interpolants 


The basic idea and steps of proving the existence of interpolants are as follows: 
Because an interpolant of ø and w contains only the common symbols in ¢@ and 
w, it is natural to consider the projections of the sets defined by ¢ and w on x, 
ie. Px(O(x, y))={x | dy. (x, y)} and P,(w(x, z))={x | 3z. y(x,z)}, which are 
obviously disjoint. We therefore prove that, if h(x) = 0 separates P,.(¢(x, y)) 
and P,.(w(x,z)), then h(x) solves Problem 1 (see Proposition 1). Thus, we only 
need to prove the existence of such h(x) through the following steps: First, we 
prove that P,.(¢(x, y)) and P,(¢(x, z)) are compact semi-algebraic sets which are 
unions of finitely many basic closed semi-algebraic sets (see Lemma 1). Second, 
using Putinar’s Positivstellensatz, we prove that, for two disjoint basic closed 
semi-algebraic sets Sı and S2 of the Archimedean form, there exists a polynomial 
hi (x) such that hı(x) = 0 separates Sı and S2 (see Lemma 2). This result is then 
extended to the case that Sù is a finite union of basic closed semi-algebraic sets 
(see Lemma 3). Finally, by generalizing Lemma 3 to the case that two compact 
semi-algebraic sets both are unions of finitely many basic closed semi-algebraic 
sets and together with Proposition 1, we prove the existence of interpolant in 
Theorem 2 and Corollary 1. 
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Proposition 1. If h(x) € R[x] satisfies the following constraints 
Vx € Px (O(x,y)).A(x) > 0 and Vx € Px(W(x,z)).A(x) <0, (1) 


then h(x) > 0 is an interpolant for ¢(x,y) and w(x, z), where ġ(x, y) and w(x, z) 
are defined as in Problem 1. 


Proof. According to Definition 3, it is enough to prove that ¢(x,y) = h(x) > 0 
and y(x, z) = h(x) < 0. 

Since any (xo, yo) satisfying ¢(x, y) must imply Xo € P,(¢(x,y)), it follows 
that h(xo) > 0 from (1) and ¢(x,y) = h(x) > 0. Similarly, we can prove 
y(x, z) | h(x) < 0, implying that y(x,z) = h(x) < 0. Therefore, h(x) > 0 is 
an interpolant for (x, y) and y(x, z). 


In order to synthesize such h(x) in Proposition 1, we first dig deeper into 
the two sets P(¢(x,y)) and P,.(w(x,z)). As shown later, i.e. in Lemma 1, we 
will find that these two sets are compact semi-algebraic sets of the form {x | 
Vii Ni +, Qij(x) > 0}. Before this lemma, we introduce Finiteness theorem 
pertinent to a basic closed semi-algebraic subset of R”, which will be used in the 
proof of Lemma 1, where a basic closed semi-algebraic subset of R” is a set of 
the form {x € R” | ai(x) > 0,...,@k(X) > 0} with a1,...,@p € R[x]. 


Theorem 1 (Finiteness Theorem, Theorem 2.7.2 in [3]). Let A C R” bea 
closed semi-algebraic set. Then A is a finite union of basic closed semi-algebraic 
sets. 


Lemma 1. The set Px($(x,y)) is compact semi-algebraic set of the following 
form 


Px (Q(X, Hae Aas ) 2 0}, 


i=l i=l 


where a;,;(x) € R[x], i= 1,...,c, j= 1,..., Ji. The same claim applies to the 
set Py(w(x,z)) as well. 


Proof. For the sake of simplicity, we denote {(x,y) | ¢(x,y)} and P,((x,y)) 
by S and 7(S), respectively. 

Because S' is a compact set and m is a continuous map that maps compact 
set to compact set, 7(.S), which is the image of a compact set under a continuous 
map, is compact. Moreover, as S' is a semi-algebraic set and the projection of 
a semi-algebraic set is also a semi-algebraic set by Tarski-Seidenberg theorem 
[2], this implies that 7(S) is a semi-algebraic set. Thus, 7(S) is a compact semi- 
algebraic set. 

Since 7(S) is a compact semi-algebraic set, and also a closed semi- 
algebraic set, we have that 7(S) is a finite union of basic closed semi- 
algebraic sets from Theorem 1. Hence, there exist a series of polynomi- 
als a1,1(x),..., 01,7, (x),. eala ), -3 Qc, Je(X) such that 7(S) = UṢ—{x 
Ni 1 4,5 (X a; > Oh {x | va j Ni j a x) > 0}. This concludes this lemma. 
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After knowing the structure of P,(¢(x,y)) and P,(u(x,z)) being a union 
of some basic semialgebraic sets as illustrated in Lemma 1, we next prove the 
existence of h(x) € R[x] satisfying (1), as formally stated in Theorem 2. 


Theorem 2. Suppose that (x,y) and y(x, z) are defined as in Problem 1, then 
there exists a polynomial h(x) satisfying (1). 


As pointed out by an anonymous reviewer that Theorem 2 can be obtained by 
some properties of the ring of Nash functions proved in [29]. In what follows, we 
give a simpler and more intuitive proof. To the end, it requires some preliminaries 
first. The main tool in our proof is Putinar’s Positivstellensatz, as formulated in 
Theorem 3. 


Theorem 3 (Putinar’s Positivstellensatz [31]). Let pı,..., pp € R[x] and 
Sı = {x | pi(x) > 0,...,pe(x) > 0}. Assume that the quadratic mod- 
ule M(pı,..., pk) is Archimedean. For q € R[x], if q > 0 on Sı then q € 
M(p1,-++)Pk)- 


With Putinar’s Positivstellensatz we can draw a conclusion that there exists 
a polynomial such that its zero level set? separates two compact semi-algebraic 
sets of the Archimedean form, as claimed in Lemmas 2 and 3. 


Lemma 2. Let Sı = {x | pi(x) > 0,...,p7(x) > 0}, So = {x | a(x) > 
0,... qg (x) > 0} be semi-algebraic sets of the Archimedean form and S1 N S2 = 
Ø, then there exists a polynomial hı (x) such that 


Vx € Sı. hi(x) >0, VxeE So. hy(x) <0. ( 


N 
© 


Proof. Since S1 N S2 = @, it follows 


P22 0A APJ 200A Z0 Agr Z0 E -=p > 0. 


Let S3 = {x | p > 0A- ApJ >OAG 20^- Aqk = 0}, then —p; > 0 
on S3. Since Sı and S are semi-algebraic sets of the Archimedean form, it 
follows Mx(p2(x),... pJ(x), qı (x), --.,qg(x)) is also Archimedean. Hence, $3 
is compact. From —p; > 0 on S3, we further have that there exists some u1 € 
>> R[x}? such that —uip; — 1 > 0 on S3. Using Theorem 3, we have that 


—urpi — 1 E€ Mx(po(x),.--,pr(x), a (x),---,9K(x)), 


implying that there exists a set of sums of squares polynomials u2,...,ujz and 
Vo,U1, +--+, UK E R[x], such that 


—u1p1 — l = u2p2 + -+ usps + vo + viqi + +++ + UK GK. 


Let hy = $ + upi +- + Up, i.e., —hy = 5 H vo + igi +:::+vKqkK. It is 
easy to check that (2) holds. 


3 The zero level set of an n-variate polynomial h(x) is defined as {x € R” | h(x) = 0}. 
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Lemma 3 generalizes the result of Lemma 2 to more general compact semi- 
algebraic sets of the Archimedean form, which is the union of multiple basic 
semi-algebraic sets. 


Lemma 3. Assume So = {x | pi(x ) > 0,...,pr(x) > 0} and Si = {x | 
qia(x) > 0,...,qi,«,(x) > 0}, i = 1,...,b, are semi-algebraic sets of the 
Archimedean form, and So N U Si = f, den there exists a polynomial ho(x) 


such that : 


Vx € So. ho(x) >0, Vx € |) Si. ho(x) <0. (3) 
i=1 


In order to prove this lemma, we prove the following lemma first. 


Lemma 4. Letc,d ER with0 < c< d and Uo = |c, d|". There exists a polyno- 
mial h(x) such that 


x € Uo H Î(x) > 0 H N zi >0, (4) 


where X = (£1,..., £r). 


di We show that there exists k € N such that h(x) = (2)?* — (a — 44)? 
— (a, — $4)? satisfies (4). It is evident that h(x) > 0 = Aj_, z; > 0 holds. 
In the following we just need to verify that A\j_,¢ < os <d h(x) > 0 holds. 


Since c < x; < d, we have (x; — $4)?* < (& f2)2k and (¢)?* — Dini (ti - 
etd)2k > (4)2k r(450)2k, Obviously, if an integer k satisfies (7%)”™ > r, then 
(4) — Yi (z: — $*)?* > 0. The existence of such k satisfying (74)?* >r is 


assured by 16 >1. 


Now we give a proof for Lemma 3 as follows. 


Proof (of Lemma 3). For any i with 1 < i < b, according to Lemma 2, there exists 
a polynomial h; € R[x], satisfying Vx € So. hi(x) > 0 and Vx € Sj. h(x) < 0. 
Next, we construct ho(x) € R[x] from hi(x),...,h,(x). Since So is a semi- 
algebraic set of the Archimedean form, So is compact and thus h;(x) has min- 
imum value and maximum value on So, denoted by c; and d; respectively. Let 
c=min(c),...,cy) and d = max(d,,...,d,)). Clearly, 0 < c < d. 


From Lemma 4 there must exist a polynomial h(w1,..., wa) such that 
b A 
N c< wi < dH Î(w,..., w) > 0, (5) 
i=1 
b 
A(wi,...,w») > 0H A wi >0. (6) 
i=1 


Let h(x) = h(hi(x),...,h»(x)). Obviously, hi (x) € R[x]. We next prove that 
ho(x) “satiation (3) in Lemma 3. 
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For all x9 € So, c < Ai(xo) < d, i = 1,...,b, hO(xo) = 
h(hi(xo),---,ho(xo)) > 0 by (5). Therefore, the first constraint in (3), i.e. 
Vxo € So-ho(xo) > 0, holds. 

For any Xo € U Si, there must exist some i such that xg € Si, implying 
that h;(xo) < 0. By (6) we have hh (xo) = h(hi(xo),..-,hs(X0)) < 0. 

Thus, we obtain the conclusion that there exists a polynomial h(x) such 
that Vx € So. ho(x) > 0, and Vx € U? Si. ho(x) < 0. Also, since So is a 
compact set, and hj(x) > 0 on So, there must exist some positive number e€ > 0 


such that ho(x) — € > 0 over So. Then ho(x) — € < 0 on B Si. Therefore, 
setting ho(x) := ho(x) — €, Lemma 3 is proved. 


In Lemma 3 we proved that there exists a polynomial h(x) € R[x] such that 
its zero level set is a barrier between two semi-algebraic sets of the Archimedean 
form, of which one set is a union of finitely many basic semi-algebraic sets. In 
the following we will give a formal proof of Theorem 2, which is a generalization 
of Lemma 3. 


Proof (of Theorem 2). According to Lemma 1 we have that P(¢(x,y)) and 
P,.(w(x,z)) are compact sets, and there respectively exists a set of polynomials 
pij(x) € R[x], i = 1,...,a, j = 1,..., Ji, and ms(x) € Rix], J = 1,...,6, 
k=1,...,K;, such that 


a Jj b 
Pelox y)) = {x| V A pis) 20}, Pelz) = {x | V A ae) = 0}. 


i=1 j=1 l=1 k=1 


Since P(¢(x,y)) and P(%(x,z)) are compact sets, there exists a positive 
N € R such that f = N — Y}; 2? > 0 over Py((x,y)) and Py(w(x,z)). 
For each i = 1,...,a@ and each J = 1,...,b, set pio = qo = f- Denote 
fx | Ver Axor) = 0} = Uni | Ajiorig) = 0} by A and 
{x | View Akzo a(x) > 0} = Ur tx | Agtoa.a(x) 2 0} by Po. It is easy 
to see that P, = Px.(¢(x,y), Po = Px(wW(x,z)). 
Since 6A w } L, there does not exist (x,y,z) € R™tT® that satisfies 6A W, 
implying that P,(¢(x, y)) N Px(w(x,z)) = Ú and thus P, N P, = Ø. Also, since 
Ji . Ji 
{x | Ajo Pi (X) > 0} C Py, for each i, = 1,...,a, {x | Ajo Pi (5) > 
0} N Pz = holds. By Lemma 3 there exists hi, (x) € R[x] such that 


Jiz 
Yx € {x | Å Pag) > O}-hi,(x) > 0, Vx E€ Pz-hi (x) <0. 
j=0 


Let S = {x | —hi(x) > 0,...,—ha(x) > 0,N — X; x? > 0}. Obviously, 
S’ is a semialgebraic set of the Archimedean form, P> C S” and Pi NS’ = 9. 
Therefore, according to Lemma 2, there exists a polynomial A(x) € R[x] such 
that Vx € 9’. h(x) > 0 and Vx € Pı. h(x) < 0. Let h(x) = —A(x), then 
we have Vx € Pı. h(x) > 0 and Vx € P2. h(x) < 0, implying that Vx € 
Px(0(x,y)).A(x) > 0 and Yx € Px(~(x,z)).h(x) < 0. Thus, this completes the 
proof of Theorem 2. 
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Consequently, we immediately have the following conclusion. 


Corollary 1. Let ¢(x,y) and (x,z) be defined as in Problem 1. There must 
exist a polynomial h(x) € R[x] such that h(x) > 0 is an interpolant for ġ and w. 


Actually, since P,(¢(x,y)) and P,.(w(x,z)) both are compact set by Lemma 
1, and h(x) > 0 on Px (¢(x,y)) and h(x) < 0 on Px(¢(x, z)), we can obtain h'(x) 
by giving a small perturbation to the coefficients of h(x) such that h'(x) has the 
property of h(x). Hence, there should exist a h(x) € Q[x] such that h(x) > 0 is 
an interpolant for ¢ and 4%, intuitively. 


Theorem 4. Let ¢(x,y) and (x,z) be defined as in Problem 1. There must 
exist a polynomial h(x) € Q|x] such that h(x) > 0 is an interpolant for ¢ and 
Y. 


Proof. We just need to prove there exists a polynomial h(x) € Q|x] satisfying 
(1). 

By Theorem 2, there exists a polynomial h'(x) € R[x] satisfying (1). Since 
Px(0(x,y)) and Px(q(x,z)) are compact sets, h'(x) > 0 on P,(¢(x,y)) and 
h'(x) < 0 on Py(w(x,z)), there exist 7, > 0 and ņ > 0 such that 


Vx E€ Px((x, y)).h’(x) —m > 0, Vx € Px (w(x, z)).h'(x) + n2 < 0. 


219 
where a € N", Q C N” is a finite set of indices, r is the dimension of x, x® is 


the monomial xf! ---x@", and 0 Æ cq € R is the coefficient of monomial x®. Let 
N = |Q| be the cardinality of 2. Since Px(o(x, y)) and Px (W(x, z)) are compact 
sets, for any a € 2, there exists Ma > 0 such that Ma = max{|x°| | x € 
P,( (x, y)) U Px (a(x, z))}. Then for any fixed polynomial h(x) = X eq dax“, 
with da E [Ca — wags Co + wi) and any x € P,(O(x,y)) U Px (¥(x,z)), we 
have 


A) -K= E (da = 0) "| < E Mda = ca) IRS Ma = 0 


aER aER aE 


Let 7 = min(%, 2). Suppose h'(x) € R[x] has the form h’(x) = J aeg CaX", 


Since 7 = min(#, 4), hence 


Vx € Px(o(x,y)).A(x) > a >0, Vx € Px(th(x,2)).A(x) < -7 <0. (7) 
Since for any da [ey Ny Ca | Way] (7) holds, there must exist some 
rational number ra € Q in [ca — WIT» Ca + Tz! Satisfying (7) because of the 
density of rational numbers. Thus, let h(x) = X aeg Trax“. Clearly, it follows 
that h(x) € Q[x] and (1) holds. 


So, the existence of h(x) € Q|x] is guaranteed. Moreover, from the proof of 
Theorem 4, we know that a small perturbation of h(x) is permitted, which is a 
good property for computing h(x) in a numeric way. In the subsequent subsec- 
tion, we recast the problem of finding such h(x) as a semi-definite programming 
problem. 
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4 SOS Formulation 


Similar to [7], in this section, we discuss how to reduce the problem of finding 
h(x) satisfying (1) to a sum of squares programming problem. 


Theorem 5. Let (x,y) and y(x, z) be defined as in the Problem 1. Then there 
exist m4+n+2 SOS (sum of squares) polynomials u;(x,y) (i = 1,...,m + 1), 
v(x, z) (j =1,...,2 +1) and a polynomial h(x) such that 


h—-1=S ufi + ums, =h — 1 = $ 0595 + Vn+1, (8) 


i=1 j=l 
and h(x) > 0 is an interpolant for ¢(x,y) and W(x, z). 
Proof. By Theorem 2 there exists a polynomial h(x) such that 
Yx € Py(6(x,y)).A(x) >0, Vx € Py(u(x,z)).h(x) < 0. 
Set Sı = {(x,y) | fı = 0,..., fm Z O} and S2 = {(x,z) | 91 = 0,-..,9n = 


0}. Since h(x) > 0 on Sı, which is compact, there exist €, > 0 such that 


h(x) — €ı > 0 on Sı. Similarly, there exist e2 > 0 such that —h(x) — e2 > 0 
on Sj. Let € = min(€1,€2), and h(x) = Ax) | then h(x) — 1 > 0 on Sı and 
—h(x) — 1 > 0 on S2. Since Mx y(fi(x,y),---, fm(x,y)) is Archimedean, from 
Theorem 3, we have h(x) —1 € Mx y(fi(x,y),.--, fm(X,y)). Similarly, —h(x) — 
1 € Mx2(g1(X,Z),---,9n(x,zZ)). That is, there exist m+n +2 SOS polynomials 
ui, vj satisfying the following semi-definite constraints: 


h(x) -l= 5 ui fi + Um-+1) —h(x) -l= 5 Ujgj + Un+i1- 


According to Theorem 5, the problem of findifig h(x) € R[x] solving 
Problem 1 can be equivalently reformulated as the problem of searching for 
SOS polynomials ui(x,y),..-,Um(x,y), U1(X,Z),.--,;Un(x,zZ) and a polynomial 
h(x) with appropriate degrees such that 


h(x) -1- J ufi € X Ry), 


— h(x) -= 1- J vg; € O RB 2), (9) 


Ui E XCR, y]?, i =1,...,m, 
vj € XCR, z2}, j Sd ba Ti: 


(9) is SOS constraints over SOS multipliers u (x,y),...,Um(x,y), v(x, Z), 
..+;Un(X,Z), polynomial h(x), which is convex and could be solved by many 
existing semi-definite programming solvers such as the optimization library 
AiSat [7] built on CSDP [4]. Therefore, according to Theorem 5, h(x) > 0 is 
an interpolant for ¢@ and Y, which is formulated in Theorem 6. 
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Theorem 6 (Soundness). Suppose that (x,y) and w(x,z) are defined as in 
Problem 1, and h(x) is a feasible solution to (9), then h(x) solves Problem 1, 
i.e. h(x) > 0 ts an interpolant for ọ and w. 


Moreover, we have the following completeness theorem stating that if the 
degrees of h(x) € R[x] and u,(x,y) € X R[x,y]?,v,;(x,z) € X R[x, z], i = 
1,...,m, j =1,...,n, are large enough, h(x) can be synthesized definitely via 
solving (9). 


Theorem 7 (Completeness). For Problem 1, there must be polynomials 
ui(x,y) € Ry[x,y] (i = 1,...,m), vj(x,z) € Ra[x,z] (j = 1,...,n) and 
h(x) € Ry|x] satisfying (11) for some positive integer N, where R|] stands 
for the family of polynomials of degree no more than k. 


Proof. This is an immediate result of Theorem 5. 
Example 2. Consider two contradictory formulas ¢ and w defined by 


fi(x,y, z, a1, b1, c1, d1) 2 OA fo(x,y, Z, a1, b1, c1, d1) 2 0A f3(x, y, z, a1, b1, c1, di) 
g(x, Y, Z, a2, b2, C2, d2) > OA go(a, Y, Z, a2, be, C2, d2) 2 OA gs(a, Y, Z, a2, b2, C2, d2) 


respectively, where 


fi=4-r -y -z-ai -bi -a-di fe = —y* +22" — ai — 1/100, 


2 2 2 2 2 2 2 2 2 2 
fs zZ bi C1 di T L gı = 4 T yY zZ ag b3 C2 ds, 


g2 = x y a2 b2 d 3, g3 =T. 


It is easy to observe that ¢@ and w satisfy the conditions in Problem 1. Since 
there are local variables in @ and w and the degree of fə is 4, the interpolant 
generation methods in [7] and [10] are not applicable. We get a concrete SDP 
problem of the form (9) by setting the degree of the polynomial h(x, y, z) in (9) 
to be 2. Using the MATLAB package YALMIP [23] and Mosek [28], we obtain 


h(a, y, z) = — 416.7204 — 914.7840x + 472.6184y + 199.8985a7 + 190.22524? 
+ 690.42082? — 187.1592cry. 


Pictorially, we plot Pr y,-(O(z, y, z,a1,61,¢1,d1)), Pey2(W(2, Y, z, a2, be, C2, 
d2)) and {(x,y,z) | h(a,y,z) > O} in Fig.2. It is evident that h(a,y,z) as 
presented above for dp = 2 is a real interpolant for $(x,y,z,a,b,c,d) and 
w(a,y, Z, a, b,c, d). 
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5 Avoidance of the Unsoundness Due to Numerical Error 
in SDP 


In this section, we discuss how to 
avoid the unsoundness of our app- 
roach caused by numerical error in 
SDP based on the work in [32]. 

A square matrix A is positive 
semidefinite if A is real symmetric 
and all its eigenvalues are nonnegative, 
denote by A > 0. 

In order to solve formula (9) to 
obtain h(x), we first need to fix a 
degree bound of ui, vj and h, say 
2d, d € N. It is well-known that any 
u(x) € X R[x]? with degree 2d can be 
represented by 


u(x) = Eq(x)"C,,Ea(x), (10) 

alaouri Fig. 2. Example 2. (Red region: Pzr,y,z 
where Cu € RCD) with Cu = (¢(#,y,z,a1,b1,c1,d1)); green region: 
0, Ea(x) is a column vector with all P, y,-(%(£,y, z, a2, b2, c2,d2)); gray region: 
monomials in x, whose total degree is {(z,y,2) | h(x, y,z) > 0}.) (Color figure 
not greater than d, and Ey(x)? stands online) 
for the transposition of Eqa(x). Equaling the corresponding coefficient of each 
monomial whose degree is less than or equal to 2d at the two sides of (10), we 
can get a linear equation system as 


tr(Au~Cu) = Daks k= 1, tee Ku, (11) 


where Au,k € RCC) is constant matrix, bu,k € R is constant, tr( A) stands 
for the trace of matrix A. Thus, searching for u;, vj and h satisfying (9) can be 
reduced to the following SDP problem: 


find: Cuise: Cums Ovire: Cuns Chs 
s.t. bY Ay, pCa = buk, i= 1,... Mm, k = 1,..., Kuis 
tr Ag cn) = bonk J = lyen k = 1s cy Bay, (12) 
tr(Anr kCn) = bhk, K=1,...,Kn, 
diag(Cu,,---,Cu,,,Co.,---,Co,,Cr-1-uf, C-n—1-09) = 9, 


where Ch_1-uf is the matrix corresponding to polynomial h — 1 — Moa ui fis 


which is a linear combination of Cui, ---, Cum and Ch; similarly, C-hr—1—vg is 
the matrix corresponding to polynomial —h — 1 — Fa vjgj, Which is a linear 
combination of Cy,,..., Cy, and Ch; and diag(C,...,C) is a block-diagonal 


matrix of C1, ..., Ck. 
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Let D be the dimension of C = = diag(Cy,,...,C-n—i—vg), ie. 

diag(Cu,,---;C_n—1-vg) E€ R?*” and C be the approximate solution to (12) 
returned by calling a numerical SDP solver, the following theorem is proved in 
[32]. 
Theorem 8 ((32], Theorem 3). C > 0 if there exists Č € F2*? such that 
the following conditions hold: 1. Ci; = Cy, for any iF j; 2. Cu < Cy — a, for 
any i; and 3. the Cholesky algorithm implemented in floating-point arithmetic 
can conclude that C is positive semi-definite, where F is a floating-point format, 
a= ape tr(C) +4(D+1)(2(D+4+ 2) +max;{Ci})n, in which «k is the unit 
roundoff of F and ņ is the underflow unit of F. 


Corollary 2. Let Č € FP*P. Suppose that pees +4(D+1)n< 4, 8 = 


opsctr(C) + 4(D + 1)(2(D + 2) + max;{C;})n > 0, where F is a floating- 


point format. Then C+ 26I = 0 if the Cholesky algorithm based on floating-point 
arithmetic succeeds on C, i.e., concludes that C is positive semi-definite. 


According to Remark 5 in [32], for IEEE 754 binary64 format with rounding 
to nearest, x = 2753 (~ oo) and 7 = 271075(~ 10-83). In this case, the order 
of magnitude of £ is 10-19 and 742E +4(D+1)n is 10-13, much less than 4. 
Obviously, 8 becomes smaller when the length of binary format becomes longer. 
W.Lo.g., we suppose that the Cholesky algorithm succeed in computing C the 
solution of (12), which is reasonable as if an SDP solver returns a solution C, 
then C should be considered to be positive semi-definite in the sense of numeric 
computation. p 

So, by Corollary 2, we have C +2681 = 0 holds, where J is the identity matrix 
with the corresponding dimension. Then we have 


diag(Ĉu s- Ĉun Ĝon- Cons On-1-uf, C-h-1—vg) + 28I >= 0. 


Let € = maxpepi<i<x, |tr(ApiCp) — bp il, where P = {u1,..., Um, 
U1,+++;Un,h}, which can be regarded as the tolerance of the SDP solver. Since 
|tr(Ap Cp) — bp,i| is the error term for each monomial of p, i.e., € can be con- 
sidered as the error bound on the coefficients of polynomials u;, vj and h, for 
any polynomial ú; ( vj; and h), computed from (11) by replacing Cu with the 
corresponding Ca, there exists a corresponding remainder term Ru, (resp. Ry, 
and Rp») with degree not greater than 2d, whose coefficients are bounded by e. 
Hence, we have 


Uj Ru; 28Ea(x,y)" Ea(x, y) a 
0; + Ry, + 2GE4(x, z) T E(x, z) e > Rix, z}? Z ti 


h+ Ry -1- Y (Q + R) fi + 26Ea(x, y)” E(x, y) € XO Rix, y], (13) 


-h Eg R, =L= SG Wy Ry, )gj T 26Ea4(x, z2)" Ea(x, z) € XC Rix, 2]? 
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Now, in order to avoid unsoundness of our approach caused by the numerical 
issue due to SDP, we have to prove 


fr20AsoA fa SO Sh > 0, (14) 
m>0A-:Agn>0Sh<0. (15) 


Regarding (14), let Roa be a polynomial in R[|x|], whose total degree is 2d, 
and all coefficients are 1, e.g., Roxy = 1 + |2| + |y| + |x?| + |zy| + |y?|. Since 
S = {(x,y) | fL >OA---A fm > 0} is a compact set, then for any polynomial 
p € R[x, y], |p| is bounded on S. Let Mı be an upper bound of Roax,y on S, 
Mz an upper bound of Ea(x, y)" a(x, y), and My, an upper bound of f; on S. 
Then, |R,,|, |R}, | and |R;,| are bounded by «Mj. Let Eyy = Ea(x, y)" Ea(x, y). 
So for any (xo, yo) E€ S, considering the polynomials below at (xo, y0) € S, by 
the first and third line in (13), 


h> 1- Rat (G + R,,) fi — 2 Exy 


i=l 
> 1—eMı + Sa + Ru, + 2bExy + Rl, — Ru, — 28E xy) fi — 26M2 
i=1 
= 1 — eMı — 28M, yi b Ru, + 2BExy) fi + DEA = Ru; — 28 By) fi 
{=l {=l 
> 1— «Mı — 28M +0 — > (eM, +M, + 28M2)My, 
i=l 
=1-(2 5 Mps, +1)Mie — D My, +1)M26. 
t=1 i=l 
Whence, 


m m 


AZOA Afm20>h>1- (2 My, + 1)Mye — 2(X" My, + 1) Mp8. 


i=l i=1 


Let S’ = {(x,z) | gı > 0^---^gn > 0}, M; be an upper bound of Rz4,x,z on 
S', M4 an upper bound of Ey(x,z)? Ea(x,z) on S’, and Mg, an upper bound of 
gj on S’. Similarly, it follows 


DZOA AGn 0> —h>1- (2X My, +1)Mze— (X M,, +1)M4. 


j=1 j=1 
So, the following proposition is immediately. 
Proposition 2. There exist two positive constants yı and y2 such that 
fi 20A--A fm 20> h21- ye- yb, (16) 
m DOA Agn > 0 —h>1— Ne — of. (17) 
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Since € and @ heavily rely on the numerical tolerance and the floating point 
representation, it is easy to see that € and @ become small enough with ye < Z 


and 723 < Z, if the numerical tolerance is small enough and the length of the 


floating point representation is long enough. This implies 


AZOA Af >OSh>O, gi > 0A-+:Agn > 0S —-h>0. 


If so, any numerical result h > 0 returned by calling an SDP solver to (12) 
is guaranteed to be a real interpolant for ¢ and y, i.e., a correct solution to 
Problem 1. 


Example 3. Consider the numerical result for Example 2 in Sect.4. Let Mp, 
Mta, Mfg, Mg,, Moo, Mga, Mi, Mo, M3, M4 are defined as above. It is easy to 
see that 


fi > O0S|a] < 2A ly] < 2A |z]| < 2A Jai] < 2A ļbi| < 2A Jer] < 2A |di| < 2. 


Then, by simple calculations, we obtain My, = 4, My, = 32, My, = 3,M, = 
83, Mə = 29. Thus, 


(25° My, + 1)M, = 6557, 2(S > My, + 1)Mp = 2320. 


i=1 i=1 
Also, since 
gi > 0 >|2| < 2A ly] < 2A Iz] S2A aa] < 2A [ba] < 2A [ea] < 2A |do] < 2, 


we obtain Mg, = 4, Mg, = 7, Mga = 2, M3 = 83, My = 29. Thus, 


(2 X` My, + 1)M; = 2241, 2(X` My, + 1)M4 = 812. 
i=1 i=1 
Consequently, we have 7, = 6557 and yz = 2320 in Proposition 2. 
Due to the fact that the default error tolerance is 1078 in the SDP solver 
Mosek and h is rounding to 4 decimal places, we have € = Be. In addition, as 


the absolute value of each element in C is less than 10°, and the dimension of 
D is less than 10°, we obtain 


(D+1)k 
1—-(@D+2) 


B= -tr(Č) +4(D + 1)(2(D + 2) + max(Či))n < 107°. 


Consequently, y1¢ < 6557 - 22 < 4, J26 < 2320 - 107 < 4, which imply that 


h(x,y,z) > 0 presented in Example 2 is indeed a real interpolant. 


Remark 1. Besides, the result could be verified by the following symbolic com- 
putation procedure instead: computing P,(¢) and P,(w) first by some sym- 
bolic tools, such as Redlog [8] which is a package that extends the com- 
puter algebra system REDUCE to a computer logic system; then verifying 
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x E€ P,(¢) > A(x) > 0 and x € P(Y) = h(x) < 0. For this example, P,,,,-(¢) 
and P,,y,,(w) obtained by Redlog are too complicated and therefore not pre- 
sented here. The symbolic computation can verify that h(x, y, z) in this example 
is exactly an interpolant, which confirms our conclusion. Alternatively, we can 
also solve the SDP in (9) using a SDP solver with infinite precision [15], and 
obtain an exact result. But this only works for problems with small size because 
a SDP solver with infinite precision is essentially based on symbolic computation 
as commented in [15]. 


6 Generalizing to General Polynomial Formulas 
Problem 2. Let (x,y) and w(x,z) be two polynomial formulas defined as fol- 
lows, 


m Ki n Sj 
oy): V bi d= A fiy) 20; yaz): V W by = N 95,0(%,2) 2 0, 
i=1 g=l s=1 


k=1 j = 


where all fi and gj s are polynomials. Suppose ¢ ^A% = L, and for i = 1,...,m, 
j=l,...,n, {(x,y) | o:(x, y)} and {(x,z) | Yj(x,z)} are all semi-algebraic sets 
of the Archimedean form. Find a polynomial h(x) such that h(x) > 0 is an 
interpolant for ¢ and w. 


Theorem 9. For Problem 2, there exists a polynomial h(x) satisfying 
Vx € P,((x,y)).h(x) > 0, Vx € Py(a(x,z)).A(x) < 0. 


Proof. We just need to prove that Lemma 1 holds for Problem 2 as well. Since 
{(x,y) | di(x,y)} and {(x,z) | ;(x,z)} are all semi-algebraic sets of the 
Archimedean form, then {(x,y) | o(x,y)} and {(x,z) | Y(x,z)} both are com- 
pact. See {(x,y) | d(x, y)} or {(x,z) | y(x,z)} as S in the proof of Lemma 
1, then Lemma 1 holds for Problem 2. Thus, the rest of proof is same as that 
forTheorem 2. 


Corollary 3. Let ¢(x,y) and w(x,z) be defined as in Problem 2. There must 
exist a polynomial h(x) such that h(x) > 0 is an interpolant for ọ and w. 


Theorem 10. Let ¢(x,y) and w(x,z) be defined as in Problem 2. Then there 
exists a polynomial h(x) and >," (Ki + 1) + a1 (55 +1) sum of squares 
polynomials uin(x,y) (@=1,...,m, k= 1,..., Ki +1), vjs(x,Zz) (j =1,...,0, 
s= 1,..., Sj) satisfying the following semi-definite constraints such that h(x) > 
0 is an interpolant for o(x,y) and w(x,2): 


K 


h-1= Ui kfi,k + Ui,K; 41> es Perens ff (18) 
k=1 
Sj 
—h-1=) 0 vj s9j,s + Vj,S;+1 J=l,...,n. (19) 
s=1 


Proof. By the property of Archimedean, the proof is same as that for T 
heorem 5. 
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Similarly, Problem 2 can be equivalently reformulated as the problem of 
searching for sum of squares polynomials satisfying 


Ki 
h(x) —1- So urfi € Rix, y]?,i=1,...,m; 
k=1 


Sj 

— h(x) — 1— X 04,095,¢ € S REx, 2], j = lTi (20) 
s=1 

Ui,k eX R[x, y]?,t=1,...,.m,k= 1,..., Ki; 

Vj,8 e > R[x,z)?,j=1,...,2,8= Lys nag ye 


Example 4. Consider 


P(X, y, G1, az, b1, b2) : (f1 2 0A fo > 0) V (fg > OA fa = 0), 
W(x, Y, C1, €2,d1,d2): (g1 > OA go > 0) V (93 > OA ga = 0), 


where 


fr=16—(ut+y—4)?-16(@—y)’-at, fr =at+y—az—(2-a 
fs = 16—(2@+y+4)?-16(2-y)?-0?, fi=-r-y-b? 

gı = 16 -16(2 +y)? — (z -y + 4)? —e?, g=y-s-—e- 
gs = 16 — 16(£ +y) — (x-y -4) -di gg =a —y— d,- (1 — dy)’. 


We get a concrete SDP problem of the form (20) by setting the degree of h(x, y) 
in (20) to be 2. Using the MATLAB package YALMIP and Mosek, we obtain 


h(x, y) = —2.3238 + 0.6957x? + 0.6957y? + 7.6524ry. 


The result is plotted in Fig. 3, and can be verified either by numerical error 
analysis as in Example 2 or by a symbolic procedure like REDUCE as described 
in Remark 1. 


Example 5 (Ultimate). Consider the following example taken from [5], which is 
a challenging benchmark to existing approaches for nonlinear interpolant gener- 
ation. 

b=(fi ZOA fo 20V fs 20)A fa 2 0A fs 20V fe 20, 
p= (g > 0Ag2 >0V 93 >0) Aga >0N gs > OV go > 0, 


where 
fi = 3.8025 — x? E y’, fe =y, 
fs = 0.9025 — (x — 1)? —y?°, fa = (z — 1)? +y? — 0.09, 
fs = (£ +1) +y? — 1.1025, fe = 0.04 — (z +1)? — y’, 
gi = 3.8025 — x? — y’, 92 = -Y, 
gs = 0.9025 —(@+1)?-y?, ga = (x +1)? +4? — 0.09, 
gs = (x — 1)? +y? — 1.1025, ge = 0.04 — (a — 1)? — y’. 
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Fig. 3. Example 4. (Red region: Pry Fig. 4. Example 5. (Red region: Pzr,y 
(d(x, y, a1, a2, b1,b2)); green region: (d(x, y)); green region: Py ,y(w(x,y)); 
Pry (W(2, y, C1, C2, di, d2)); gray region: gray region: {(x,y) | h(a,y) > O}.) 
{(z,y) | h(z,y) > O}.) (Color figure (Color figure online) 

online) 


We first convert ¢ and Ww to the disjunction normal form as: 


=(fi 2 0A fo = OA fa 20A fs = 0) V (fg [OA fa = OA fs = 0) V (fe = 9), 
Y =(g1 2 0A g2 > 0A g4 > 0A gs 2 0) V (g3 > 0A ga 2 ON gs = 0) V (G6 = 0). 


We get a concrete SDP problem of the form (20) by setting the degree of h(x, y) 
in (20) to be 7. Using the MATLAB package YALMIP and Mosek, keeping the 
decimal to four, we obtain 
h(x, y) = 1297.5980x + 191.3260y — 3172.96532° + 196.5763a7y + 2168.1739ry" 
+ 1045.7373y? + 1885.8986x° — 1009.62752*y + 3205.37932°y? — 1403.5431x” y" 
+ 1842.0669xy* + 1075.2003y° — 222.0698x" + 547.95422°y — 704.74742°y" 
+ 1724.70082*y? — 728.2229x%4* + 1775.7548x y” — 413.3771xy® + 1210.2617y". 


The result is plotted in Fig. 4, and can be verified either by numerical error 
analysis as in Example 2 or by a symbolic procedure like REDUCE as described 
in Remark 1. 


7 Application to Invariant Generation 


In this section, as an application, we sketch how to apply our approach to invari- 
ant generation in program verification, the details can be found in [11]. 

In [22], Lin et al. proposed a framework for invariant generation using weakest 
precondition, strongest postcondition and interpolation, which consists of two pro- 
cedures, i.e., synthesizing invariants by forward interpolation based on strongest 
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postcondition and interpolant generation, and by backward interpolation based 

on weakest precondition and interpolant generation. In [22], only linear invari- 

ants can be synthesized as no powerful approaches are available to synthesize 

nonlinear interpolants. Obviously, our results can strengthen their framework by 

allowing to generate nonlinear invariants. For example, we can revise the proce- 

dure Squeezing Invariant - Forward in their framework and obtain Algorithm 1. 
The major revisions include: 


— firstly, we exploit our method to synthesize interpolants see line 4 in Algo- 
rithm 1; 

— secondly, we add a conditional statement for A;,1 at line 7-10 in Algorithm 1 
in order to make A; to be Archimedean. 


The procedure Squeezing Invariant - backward can be revised similarly. 


Algorithm 1. Revised Squeezing Invariant - Forward 
Input: An annotated loop: {P} while p do C {Q}, where P and Q are Archimedean 
Output: (yes/no, T), where Z is a loop invariant 

1: Ao — P; Bo — (ApA7Q); i — 0; j — 0; 


2: while T do 
3: if (Vi, 4i) A B; is not satisfiable, (V}—o Ai) and B; are Archimedean then 
4: call our method to synthesize an interpolant for (\/;_, Ai) and B}, say Zi; 


{Use our method to generate interpolant} 


5 if {Z; A p} C {Zi} then 
6 return (yes, Ti); 
t: else if Z; is Archimedean then 
8: Ai4i <= sp(Zi ^P, C); 
9: else 
10: Ai+ı — spl A; A p, C); 
11: end if 
{sp: a predicate transformer to compute the strongest postcondition of C w.r.t. 
Li Ap} 
12: i i+l1l; Bji+ı — Bo V (pA wp(C, B;)); 
{wp: a predicate transformer to compute the weakest precondition of C w.r.t. 
Bj} 
13: Jo I+; 
14: else if A; is concrete then 
15: return (no, L); 
16: else 
17: while A; is not concrete do 
18: 4-1-1; 
19: end while 
20: Aisi — sp(AiAp,C); i i+l; 
21: endif 


22: end while 
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Example 6. Consider a loop program given in Algorithm 2 for controlling the 
acceleration of a car adapted from [21]. Suppose we know that vc is in [0,40] at 
the beginning of the loop, we would like to prove that vc < 49.61 holds after the 
loop. Since the loop guard is unknown, it means that the loop may terminate 
after any number of iterations. 

We apply Algorithm 1 to the computation of an invariant to ensure that 
uc < 49.61 holds. Since vc is the velocity of car, 0 < ve < 49.61 is required to hold 
in order to maintain safety. Via Algorithm 1, we have Ap = {vc | ve(40— vc) > 0} 
and B = {uc | vc < 0} U {ve | vc > 49.61}. Here, we replace B with B’ = 
[—2, —1]U[49.61, 55]), i.e., BY = {uc | (vc+2)(—1—vc) > 0V(vc—49.61)(55—vc) > 
0}, in order to make it with Archimedean form. 

Firstly, it is evident that Ap : vc(40 — vc) > 0 implies Ap A B’ H L. By 
applying our approach, we obtain an interpolant 


To : 1.4378 + 3.3947 * vc — 0.083 * uc? > 0 


for Ag and B’. It can be verified that {Zo} C {Zo} (line 5) does not hold, where 
C stands for the loop body. 

Secondly, by setting A, = sp(Zo, C) (line 8) and re-calling our approach, we 
obtain an interpolant 


Tı : 2.0673 + 3.0744 * uc — 0.0734 * uc? > 0 


for Ag U A; and B’. Likewise, it can be verified that {Z1} C {Z1} (line 5) does 
not hold. 


Algorithm 2. Control code for accelerating a car 
1: /* Pre: ve € [0, 40] */ 

2: while unknown do 

3 fa — 0.5418 * uc * vc; 
4: fr 1000 — fa; 
5 
6 


ac — 0.0005 « fr; 
; UC <— ve + ac; 
7: end while 
8: /* Post: ve < 49.61 */ 


Thirdly, repeating the above procedure again, we obtain an interpolant 
Th : 2.2505 + 2.7267 x vc — 0.063 * ve” > 0, 


and it can be verified that {Z2} C {Z2} holds, implying that Tə is an invariant. 
Moreover, it is trivial to verify that Zz = uc < 49.61. 

Consequently, we have the conclusion that Zz is an inductive invariant which 
witnesses the correctness of the loop. 
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8 Conclusion 


In this paper we propose a sound and complete method to synthesize Craig inter- 
polants for mutually contradictory polynomial formulas ¢(x, y) and (x, z), with 
the form fı > 0A---A fn 2 0, where f;’s are polynomials in x,y or x,z and the 
quadratic module generated by f;’s is Archimedean. The interpolant is generated 
by solving a semi-definite programming problem, which is a generalization of the 
method in [7] dealing with mutually contradictory formulas with the same set of 
variables and the method in [10] dealing with mutually contradictory formulas 
with concave quadratic polynomial inequalities. As an application, we apply our 
approach to invariant generation in program verification. 

As a future work, we would like to consider interpolant synthesizing for for- 
mulas with strict polynomial inequalities. Also, it deserves to consider how to 
synthesize interpolants for the combination of non-linear formulas and other 
theories based on our approach and other existing ones, as well as further appli- 
cations to the verification of programs and hybrid systems. 
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Abstract. Given an unsatisfiable formula F in CNF, i.e. a set of clauses, 
the problem of Minimal Unsatisfiable Subset (MUS) seeks to identify 
a minimal subset of clauses N C F such that N is unsatisfiable. The 
emerging viewpoint of MUSes as the root causes of unsatisfiability has 
led MUSes to find applications in a wide variety of diagnostic approaches. 
Recent advances in identification and enumeration of MUSes have moti- 
vated researchers to discover applications that can benefit from rich infor- 
mation about the set of MUSes. One such extension is that of counting 
the number of MUSes. The current best approach for MUS counting is 
to employ a MUS enumeration algorithm, which often does not scale for 
the cases with a reasonably large number of MUSes. 

Motivated by the success of hashing-based techniques in the context 
of model counting, we design the first approximate MUS counting proce- 
dure with (£, 6) guarantees, called AMUSIC. Our approach avoids exhaus- 
tive MUS enumeration by combining the classical technique of univer- 
sal hashing with advances in QBF solvers along with a novel usage of 
union and intersection of MUSes to achieve runtime efficiency. Our pro- 
totype implementation of AMUSIC is shown to scale to instances that 
were clearly beyond the realm of enumeration-based approaches. 


1 Introduction 


Given an unsatisfiable Boolean formula F as a set of clauses { f1, fo,..- fn}, also 
known as conjunctive normal form (CNF), a set N of clauses is a Minimal Unsat- 
isfiable Subset (MUS) of F iff N C F, N is unsatisfiable, and for each f € N 
the set N \ {f} is satisfiable. Since MUSes can be viewed as representing the 
minimal reasons for unsatisfiability of a formula, MUSes have found applications 
in wide variety of domains ranging from diagnosis [45], ontologies debugging [1], 
spreadsheet debugging [29], formal equivalence checking [20], constrained count- 
ing and sampling [28], and the like. As the scalable techniques for identification 
of MUSes appeared only about decade and half ago, the earliest applications 
primarily focused on a reduction to the identification of a single MUS or a 
small set of MUSes. With an improvement in the scalability of MUS identifica- 
tion techniques, researchers have now sought to investigate extensions of MUSes 
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and their corresponding applications. One such extension is MUS counting, i.e., 
counting the number of MUSes of F. Hunter and Konieczny [26], Mu [45], and 
Thimm [56] have shown that the number of MUSes can be used to compute 
different inconsistency metrics for general propositional knowledge bases. 

In contrast to the progress in the design of efficient MUS identification tech- 
niques, the work on MUS counting is still in its nascent stages. Reminiscent of 
the early days of model counting, the current approach for MUS counting is to 
employ a complete MUS enumeration algorithm, e.g., [3,12,34,55], to explicitly 
identify all MUSes. As noted in Sect. 2, there can be up to exponentially many 
MUSes of F w.r.t. |F|, and thus their complete enumeration can be practically 
intractable. Indeed, contemporary MUS enumeration algorithms often cannot 
complete the enumeration within a reasonable time [10,12,34,47]. In this con- 
text, one wonders: whether it is possible to design a scalable MUS counter without 
performing explicit enumeration of MUSes? 

The primary contribution of this paper is a probabilistic counter, called 
AMUSIC, that takes in a formula F, tolerance parameter £, confidence parameter 
ô, and returns an estimate guaranteed to be within (1 + ¢)-multiplicative factor 
of the exact count with confidence at least 1 — ô. Crucially, for F defined over n 
clauses, AMUSIC explicitly identifies only O(log n-log(1/5)-(€)~?) many MUSes 
even though the number of MUSes can be exponential in n. 

The design of AMUSIC is inspired by recent successes in the design of efficient 
XOR hashing-based techniques [15,17] for the problem of model counting, i.e., 
given a Boolean formula G, compute the number of models (also known as 
solutions) of G. We observe that both the problems are defined over a power-set 
structure. In MUS counting, the goal is to count MUSes in the power-set of F, 
whereas in model counting, the goal is to count models in the power-set that 
represents all valuations of variables of G. Chakraborty et al. [18,52] proposed an 
algorithm, called ApproxMC, for approximate model counting that also provides 
the (e, ô) guarantees. ApproxMC is currently in its third version, ApproxMC3 [52]. 
The base idea of ApproxMC3 is to partition the power-set into nCells small cells, 
then pick one of the cells, and count the number inCell of models in the cell. The 
total model count is then estimated as nCells x inCell. Our algorithm for MUS 
counting is based on ApproxMC3. We adopt the high-level idea to partition the 
power-set of F into small cells and then estimate the total MUS count based on a 
MUS count in a single cell. The difference between ApproxMC3 and AMUSIC lies 
in the way of counting the target elements (models vs. MUSes) in a single cell; 
we propose novel MUS specific techniques to deal with this task. In particular, 
our contribution is the following: 


— We introduce a QBF (quantified Boolean formula) encoding for the problem 
of counting MUSes in a single cell and use a YP oracle to solve it. 

— Let UMUpr and IMUp be the union and the intersection of all MUSes of F, 
respectively. We observe that every MUS of F (1) contains IMUp and (2) is 
contained in UMU;. Consequently, if we determine the sets UMUp and IMUp, 
then we can significantly speed up the identification of MUSes in a cell. 


Approximate Counting of Minimal Unsatisfiable Subsets 441 


— We propose a novel approaches for computing the union UMUp and the inter- 
section IMUp of all MUSes of F. 

— We implement AMUSIC and conduct an extensive empirical evaluation on 
a set of scalable benchmarks. We observe that AMUSIC is able to compute 
estimates for problems clearly beyond the reach of existing enumeration-based 
techniques. We experimentally evaluate the accuracy of AMUSIC. In partic- 
ular, we observe that the estimates computed by AMUSIC are significantly 
closer to true count than the theoretical guarantees provided by AMUSIC. 


Our work opens up several new interesting avenues of research. From a the- 
oretical perspective, we make polynomially many calls to a Dr oracle while 
the problem of finding a MUS is known to be in FP”, i.e. a MUS can be 
found in polynomial time by executing a polynomial number of calls to an NP- 
oracle [19,39]. Contrasting this to model counting techniques, where approximate 
counter makes polynomially many calls to an NP-oracle when the underlying 
problem of finding satisfying assignment is NP-complete, a natural question is 
to close the gap and seek to design a MUS counting algorithm with polynomially 
many invocations of an FP? oracle. From a practitioner perspective, our work 
calls for a design of MUS techniques with native support for XORs; the pursuit 
of native support for XOR in the context of SAT solvers have led to an exciting 
line of work over the past decade [52,53]. 


2 Preliminaries and Problem Formulation 


A Boolean formula F = { fi, fo,..., fn} in a conjunctive normal form (CNF) 
is a set of Boolean clauses over a set of Boolean variables Vars(F). A Boolean 
clause is a set {l1, l2, ... , lp} of literals. A literal is either a variable x € Vars(F) 


or its negation ~z. A truth assignment J to the variables Vars(F) is a mapping 
Vars(F) — {1,0}. A clause f € F is satisfied by an assignment I iff I(l) = 1 
for some | € f or I(k) = 0 for some 7k € f. The formula F is satisfied by I 
iff I satisfies every f € F; in such a case I is called a model of F. Finally, F is 
satisfiable if it has a model; otherwise F is unsatisfiable. 

A QBF is a Boolean formula where each variable is either universally (V) or 
existentially (4) quantified. We write Q1 ---Q;,-QBF, where Q1,... Qk € {V, 5}, 
to denote the class of QBF with a particular type of alternation of the quantifiers, 
e.g., SV-QBF or 4V3-QBF. Every QBF is either true (valid) or false (invalid). 
The problem of deciding validity of a formula in Q,---Q,-QBF where Q: = J 
is XP -complete [43]. 

When it is clear from the context, we write just formula to denote either 
a QBF or a Boolean formula in CNF. Moreover, throughout the whole text, we 
use F to denote the input Boolean Formula in CNF. Furthermore, we will use 
capital letters, e.g., S, kK, N, to denote other CNF formulas, small letters, e.g., 
f. fi, fi, to denote clauses, and small letters, e.g., x, 2’, y, to denote variables. 

Given a set X, we write P(X) to denote the power-set of X, and |X| to denote 
the cardinality of X. Finally, we write Pr[O : P] to denote the probability of an 
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Fig. 1. Illustration of the power set of the formula F from the Example 1. We denote 
individual subsets of F using the bit-vector representation. The subsets with a dashed 
border are the unsatisfiable subsets, and the others are satisfiable subsets. The MUSes 
are filled with a background color. (Color figure online) 


outcome O when sampling from a probability space P. When P is clear from the 
context, we write just Pr[O]. 


Minimal Unsatisfiability 


Definition 1 (MUS). A set N, N C F, is a minimal unsatisfiable subset 
(MUS) of F iff N is unsatisfiable and for all f E N the set N\{f} is satisfiable. 


Note that the minimality concept used here is set minimality, not minimum 
cardinality. Therefore, there can be MUSes with different cardinalities. In gen- 
eral, there can be up to exponentially many MUSes of F w.r.t. |F| (see the 
Sperner’s theorem [54]). We use AMUp to denote the set of all MUSes of F'. Fur- 
thermore, we write UMU p and IMU; to denote the union and the intersection of all 
MUSes of F, respectively. Finally, note that every subset S of F can be expressed 
as a bit-vector over the alphabet {0,1}; for example, if F = { fi, f2, fs, fa} and 
S={fi, fa}, then the bit-vector representation of S is 1001. 


Definition 2. Let N be an unsatisfiable subset of F and f € N. The clause f 
is necessary for N iff N \ {f} is satisfiable. 


The necessary clauses are sometimes also called transition [6] or critical [2] 
clauses. Note that a set N is a MUS iff every f € N is necessary for N. Also, 
note that a clause f € F is necessary for F iff f € IMUp. 


Example 1. We demonstrate the concepts on an example, illustrated in Fig. 1. 
Assume that F = {fi = {x1}, fe = {~az}, fs = {x2}, fa = {721, n2 }}. In this 
case, AMUp = HfL f2) {fis fs, Jat}, IMUp = {fi}, and UMUp = F. 


Hash Functions 


Let n and m be positive integers such that m < n. By {1,0}” we denote the set 
of all bit-vectors of length n over the alphabet {1,0}. Given a vector v € {1,0}” 
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and i € {1,...,n}, we write vfi] to denote the i-th bit of v. A hash function h 
from a family Hyor(n,m) of hash functions maps {1,0}” to {1,0}”. The family 
Hyor(n, m) is defined as {h | h(y) [i] = aio0(Bpr_, (ai,nAy[])) for all 1 < i < m}, 
where © and A denote the Boolean XOR and AND operators, respectively, and 
aik € {1,0} forall 1 <i<mandl<k<n. 

To choose a hash function uniformly at random from Hyo,(n,m), we ran- 
domly and independently choose the values of a;,,. It has been shown [24] 
that the family Hzor(n, m) is pairwise independent, also known as strongly 2- 
universal. In particular, let us by h — Hzor(n, m) denote the probability space 
obtained by choosing a hash function h uniformly at random from Hzor(n, m). 
The property of pairwise independence guarantees that for all a1,a2 € {1,0}™” 
and for all distinct y1,y2 € {1,0}", Pr[A3_, hly) = ai : h © Heor(n,m)] = 
Jem, 

We say that a hash function h € Hror(n, m) partitions {0,1}" into 2™ cells. 
Furthermore, given a hash function h € Hzor(n, m) and a cell a € {1,0}” of h, 
we define their prefiz-slices. In particular, for every k € {1,...,m}, the kt? prefix 
of h, denoted A“), is a map from {1,0}” to {1,0}* such that A“) (y) [i] = k(y)li] 
for all y € {1,0}” and for alli € {1,...,k}. Similarly, the kt” prefix of a, denoted 
a(*), is an element of {1,0}* such that a) fi] = afi] for all i € {1,...,k}. 
Intuitively, a cell a”) of A) originates by merging the two cells of h¥+® that 
differ only in the last bit. 

In our work, we use hash functions from the family H,o,(n,m) to partition 
the power-set P(F’) of the given Boolean formula F into 2 cells. Furthermore, 
given a cell a € {0,1}, let us by AMU; F n,a} denote the set of all MUSes in the 
cell a; formally, AMU; F h,a) = {M E AMUr | h(bit(M)) = a}, where bit(M) is the 
bit-vector representation of M. The following observation is crucial for our work. 


Observation 1. For every formula F, m € {1,...,|F|— 1}, h © Hzor(|F|;, m), 
and a € {0,1}™ it holds that: AMU, p pO ay 2 AMU; p pO) ld) for everyi < j. 


Example 2. Assume that we are given a formula F such that |F| = 4 and a hash 
function h € Hyor(4,2) that is defined via the following values of individual a;,,: 


a1,0>= 0, a1 = 1, 41,2 = 1, 1,3 = 0, Q14 = 1 
a20 =0, a@a21=1, a@22=0, a23=0, a@24=1 


The hash function partitions P(F) into 4 cells. For example, h(1100) = 01 
since h(1100)[1] =0 9 (1A 1) (1^1) (0^0) 6 (1A 0) = 0 and A(1100) [2] = 
0G (1A1)6(0A1) 8 (0A0) S(1A0) = 1. Figure 2 illustrates the whole partition 
and also illustrates the partition given by the prefix h® of h. 


2.1 Problem Definitions 


In this paper, we are concerned with the following problems. 

Name: (e,5)-#MUS problem 

Input: A formula F, a tolerance e€ > 0, and a confidence 1 — 6 € (0, 1]. 
Output: A number c such that Pr||AMUp|/(1+6) < c < |AMUp|-(1+6)] >1-6. 
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(a) Illustration of h®? = h with 4 cells: (b) Illustration of h® with 2 cells: 


WOO), 3s = 10), a=), @ =D. 


Fig. 2. Illustration of the partition of P(F) by h = h® and A from Example 2. In 
the case of h, we use 4 colors, orange, pink, white, and blue, to highlight its four cells. 
In case of h, there are only two cells: the white and the blue cells are merged into 
a white cell, and the pink and the orange cells are merged into an orange cell. (Color 
figure online) 


Name: MUS-membership problem 
Input: A formula F and a clause f € F. 
Output: True if there isa MUS M € AMUp such that f € M and False otherwise. 


Name: MUS-union problem 
Input: A formula F. 
Output: The union UMUp of all MUSes of F. 


Name: MUS-intersection problem 
Input: A formula F. 
Output: The intersection IMUp of all MUSes of F. 


Name: (e€, 6)-#SAT problem 

Input: A formula F, a tolerance € > 0, and a confidence 1 — 6 € (0, 1]. 
Output: A number m such that Pr[m/(1+ 6) <c<m-(1+e)|] > 1-6, where 
m is the number of models of F. 


The main goal of this paper is to provide a solution to the (e€, 6)-#MUS prob- 
lem. We also deal with the MUS-membership, MUS-union and MUS-intersection 
problems since these problems emerge in our approach for solving the (e€, 6)-#MUS 
problem. Finally, we do not focus on solving the (€,6)-#SAT problem, however 
the problem is closely related to the (¢,6)-#MUS problem. 


3 Related Work 


It is well-known (see e.g., [21,36,51]) that a clause f € F belongs to IMU, iff f is 
necessary for F. Therefore, to compute IMU, one can simply check each f € F 
for being necessary for F. We are not aware of any work that has focused on the 
MUS-intersection problem in more detail. 
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The MUS-union problem was recently investigated by Mencia et al. [42]. Their 
algorithm is based on gradually refining an under-approximation of UMUp until 
the exact UMUp is computed. Unfortunately, the authors experimentally show 
that their algorithm often fails to find the exact UMUp within a reasonable time 
even for relatively small input instances (only an under-approximation is com- 
puted). In our work, we propose an approach that works in the other way: we 
start with an over-approximation of UMUp and gradually refine the approxima- 
tion to eventually get UMUp. Another related research was conducted by Jan- 
ota and Marques-Silva [30] who proposed several QBF encodings for solving the 
MUS-membership problem. Although they did not focus on finding UMUp, one can 
clearly identify UMUr by solving the MUS-membership problem for each f € F. 

As for counting the number of MUSes of F, we are not aware of any previous 
work dedicated to this problem. Yet, there have been proposed plenty of algo- 
rithms and tools (e.g., [3,9,11,12,35,47]) for enumerating/identifying all MUSes 
of F. Clearly, if we enumerate all MUSes of F, then we obtain the exact value of 
|AMU |, and thus we also solve the (€,6)-#MUS problem. However, since there can 
be up to exponentially many of MUSes w.r.t. |F|, MUS enumeration algorithms 
are often not able to complete the enumeration in a reasonable time and thus 
are not able to find the value of |AMUp|. 

Very similar to the (¢,6)-#MUS problem is the (e€,d)-#SAT problem. Both 
problems involve the same probabilistic and approximation guarantees. More- 
over, both problems are defined over a power-set structure. In MUS counting, 
the goal is to count MUSes in P(F), whereas in model counting, the goal is to 
count models in P( Vars(F’)). In this paper, we propose an algorithm for solving 
the (€,6)-#MUS problem that is based on ApproxMC3 [15,17,52]. In particular, 
we keep the high-level idea of ApproxMC3 for processing /exploring the power-set 
structure, and we propose new low-level techniques that are specific for MUS 
counting. 


4 AMUSIC: A Hashing-Based MUS Counter 


We now describe AMUSIC, a hashing-based algorithm designed to solve the (£, 6)- 
#MUS problem. The name of the algorithm is an acronym for Approximate Min- 
imal Unsatisfiable Subsets Implicit Counter. AMUSIC is based on ApproxMC3, 
which is a hashing-based algorithm to solve (£, 6)-#4SAT problem. As such, while 
the high-level structure of AMUSIC and ApproxMC3 share close similarities, the 
two algorithms differ significantly in the design of core technical subroutines. 

We first discuss the high-level structure of AMUSIC in Sect. 4.1. We then 
present the key technical contributions of this paper: the design of core subrou- 
tines of AMUSIC in Sects. 4.3, 4.4 and 4.5. 


4.1 Algorithmic Overview 


The main procedure of AMUSIC is presented in Algorithm 1. The algorithm takes 
as an input a Boolean formula F in CNF, a tolerance €(> 0), and a confidence 
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Algorithm 1: AMUSIC(F, €, ô) 


threshold — 1 + 9.84(1 + 7%) (1 + 4)? 

Y < FindMUSes(F, threshold) 

if |Y| < threshold then return |Y| 

G <— getUMU(F) 

Ig + getIMU(G) 

nCells — 2; C — emptyList; iter — 0 

while iter < [17log,(3/6)| do 
iter — iter +1 
(nCells, nSols) — AMUSICCore(G, Ig, threshold, nCells) 
if nCells Æ null then AddToList(C, nCells x nSols) 


return FindMedian(C) 


COMDNOaanhwWN FH 


m 
© 


j 
jas 


parameter 6 € (0,1], and returns an estimate of |AMUp| within tolerance € and 
with confidence at least 1 — 6. Similar to ApproxMC3, we first check whether 
|AMU| is smaller than a specific threshold that is a function of e. This check is 
carried out via a MUS enumeration algorithm, denoted FindMUSes, that returns 
a set Y of MUSes of F such that |Y | = min(threshold, |AMU F|). If |Y | < threshold, 
the algorithm terminates while identifying the exact value of |AMUp]. In a sig- 
nificant departure from ApproxMC3, AMUSIC subsequently computes the union 
(UMU;-) and the intersection (IMU;) of all MUSes of F by invoking the subrou- 
tines GetUMU and GetIMU, respectively. Through the lens of set representation 
of the CNF formulas, we can view UMUp as another CNF formula, G. Our key 
observation is that AMUp = AMUg (see Sect. 4.2), thus instead of working with the 
whole F, we can focus only on G. The rest of the main procedure is similar to 
ApproxMC3, i.e., we repeatedly invoke the core subroutine called AMUSICCore. 
The subroutine attempts to find an estimate c of |AMUg| within the tolerance 
e. Briefly, to find the estimate, the subroutine partitions P(G) into nCells cells, 
then picks one of the cells, and counts the number nSols of MUSes in the cell. 
The pair (nCells, nSols) is returned by AMUSICCore, and the estimate c of |AMUg| 
is then computed as nSols x nCells. There is a small chance that AMUSICCore 
fails to find the estimate; it such a case nCells = nSols = null. Individual esti- 
mates are stored in a list C. After the final invocation of AMUSICCore, AMUSIC 
computes the median of the list C and returns the median as the final estimate 
of |AMUg|. The total number of invocations of AMUSICCore is in O(log(1/6)) 
which is enough to ensure the required confidence 1 — 6 (details on assurance of 
the (e, ô) guarantees are provided in Sect. 4.2). 

We now turn to AMUSICCore which is described in Algorithm 2. The parti- 
tion of P(G) into nCells cells is made via a hash function h from Hyo,(|G|,m), 
i.e. nCells = 2”. The choice of m is a crucial part of the algorithm as it regu- 
lates the size of the cells. Intuitively, it is easier to identify all MUSes of a small 
cell; however, on the contrary, the use of small cells does not allow to achieve a 
reasonable tolerance. Based on ApproxMC3, we choose m such that a cell given 
by a hash function h € Hzor(|G|,m) contains almost threshold many MUSes. In 
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Algorithm 2: AMUSICCore(G', Ig, threshold, prevN Cells) 


Choose h at random from Hzor(|G|, |G| — 1) 

Choose a at random from {0, 1}!@!-? 

nSols — CountInCell(G, Ia, h, a, threshold) 

if nSols = threshold then return (null, null) 

mPrev — log, prevNCells 

(nCells, nSols) — LogMUSSearch(G, Ia, h, a, threshold, mPrev) 
return (nCells, nSols ) 


Noa ep WN Be 


particular, the computation of AMUSICCore starts by choosing at random a hash 
function h from Hzor(|G], |G] — 1) and a cell a at random from {0,1}!¢l-!. Sub- 
sequently, the algorithm tends to identify m*” prefixes h(”) and a(™ of h and a, 
respectively, such that [AMU ;G aom) a) | < threshold and [AMU; G hon- am- | > 
threshold. Recall that AMU (Gg A) a0) 2) aes 2) AMU (Gg AUIGI-) aG- (Observa- 
tion 1, Sect. 2). We also know that the cell a, i.e. the whole P(G), contains at 
least threshold MUSes (see Algorithm 1, line 3). Consequently, there can exist at 
most one such m, and it exists if and only if [AMU (G hacl- ael) | < threshold. 
Therefore, the algorithm first checks whether [AMU (G hacl) ,o(IGI-2)) | < threshold. 
The check is carried via a procedure CountInCell that returns the number nSols = 
min(|AMU,g haci- q(iei-1)) |, threshold). If nSols = threshold, then AMUSICCore 
fails to find the estimate of |AMUg| and terminates. Otherwise, a procedure 
LogMUSSearch is used to find the required value of m together with the num- 
ber nSols of MUSes in a’. The implementation of LogMUSSearch is directly 
adopted from ApproxMC3 and thus we do not provide its pseudocode here (note 
that in ApproxMC3 the procedure is called LogSATSearch). We only briefly sum- 
marize two main ingredients of the procedure. First, it has been observed that 
the required value of m is often similar for repeated calls of AMUSICCore. There- 
fore, the algorithm keeps the value mPrev of m from previous iteration and first 
test values near mPrev. If none of the near values is the required one, the algo- 
rithm exploits that AMU (Gn) a0) DEZES) AMU, G aIGI-D a(IG1-0) 5 which allows 
it to find the required value of m via the galloping search (variation of binary 
search) while performing only log |G] calls of CountlnCell. 

Note that in ApproxMC3, the procedure CountInCell is called BSAT and it is 
implemented via an NP oracle, whereas we use a X? oracle to implement the 
procedure (see Sect. 4.3). The high-level functionality is the same: the procedures 
use up to threshold calls of the oracle to check whether the number of the target 
elements (models vs. MUSes) in a cell is lower than threshold. 


4.2 Analysis and Comparison with ApproxMC3 


Following from the discussion above, there are three crucial technical differences 
between AMUSIC and ApproxMC3: (1) the implementation of the subroutine 
CountlnCell in the context of MUS, (2) computation of the intersection IMUp of 
all MUSes of F and its usage in CountInCell, and (3) computation of the union 
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UMUp of all MUSes of F and invocation of the underlying subroutines with G (i.e., 
UMU;-) instead of F. The usage of CountInCell can be viewed as domain-specific 
instantiation of BSAT in the context of MUSes. Furthermore, we use the com- 
puted intersection of MUSes to improve the runtime efficiency of CountInCell. It 
is perhaps worth mentioning that prior studies have observed that over 99% of 
the runtime of ApproxMC3 is spent inside the subroutine BSAT [52]. Therefore, 
the runtime efficiency of CountInCell is crucial for the runtime performance of 
AMUSIC, and we discuss in detail, in Sect. 4.3, algorithmic contributions in the 
context of CountInCell including usage of IMUr. We now argue that the replace- 
ment of F with G in line 4 in Algorithm 1 does not affect correctness guarantees, 
which is stated formally below: 


Lemma 1. For every G” such that UMUr C G’ C F, the following hold: 


AMUp = AMUG: (1) 
IMUp = IMUc (2) 


Proof. (1) Since G’ C F then every MUS of G” is also a MUS of F. In the other 
direction, every MUS of F is contained in the union UMUp of all MUSes of F, 
and thus every MUS of F is also a MUS of G’ (2 UMUp). 


(2) IMUp = Numer E A reanug, = MUo. 


Equipped with Lemma 1, we now argue that each run of AMUSIC can be 
simulated by a run of ApproxMC3 for an appropriately chosen formula. Given 
an unsatisfiable formula F = {fi,..., firj}, let us by Bp denote a satisfi- 
able formula such that: (1) Vars(Br) = {#1,...,a)7)} and (2) an assignment 
I: Vars(Br) — {1,0} is a model of Br iff {fi|I(xz:) = 1} is a MUS of F. 
Informally, models of Br one-to-one map to MUSes of F. Hence, the size of sets 
returned by CountInCell for F is identical to the corresponding BSAT for Br. 
Since the analysis of ApproxMC3 only depends on the correctness of the size of 
the set returned by BSAT, we conclude that the answer computed by AMUSIC 
would satisfy (£, ô) guarantees. Furthermore, observing that CountInCell makes 
threshold many queries to ©?-oracle, we can bound the time complexity. For- 
mally, 


Theorem 1. Given a formula F, a tolerance € > 0, and a confidence 1— ô € 
(0, 1], let AMUSIC(F,£, ô) return c. Then Pr{|AMUp|/(1 +€) < c < |AMUp|- (1 + 
€) > 1— ô. Furthermore, AMUSIC makes O(log |F| - 4 - log(1/6)) calls to X? 


oracle. 


Few words are in order concerning the complexity of AMUSIC. As noted in 
Sect. 1, for a formula on n variables, approximate model counters make O(log n- 
4 -log(1/6)) calls to an NP oracle, whereas the complexity of finding a satisfying 
assignment is NP-complete. In our case, we make calls to a X? oracle while the 
problem of finding a MUS is in FP”. Therefore, a natural direction of future 
work is to investigate the design of a hashing-based technique that employs an 
FP*” oracle. 
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Algorithm 3: CountlnCell(G, Ig, h, a, threshold) 
1 c0; Me{} 

2 while c < threshold do 

3 M <— GetMUS(G, IG, M, h,a) 

4 if M = null then return c 

5 Me MU{M} 

6 c#+c+l1 

7 


return c 


4.3 Counting MUSes in a Cell: CountInCell 


In this section, we describe the procedure CountInCell. The input of the pro- 
cedure is the formula G (i.e., UMUr), the set Ig = IMUg, a hash function 
h € Hzor(|G|,m), a cell a € {0,1}, and the threshold value. The output is 
c = min(threshold, |AMU;G h,a) |). 

The description is provided in Algorithm 3. The algorithm iteratively calls 
a procedure GetMUS that returns either a MUS M such that M € (AMU;@,h,a) \M) 
or null if there is no such MUS. For each M, the value of c is increased and M is 
added to M. The loop terminates either when c reaches the value of threshold or 
when GetMUS fails to find a new MUS (i.e., returns null). Finally, the algorithm 
returns c. 


GetMUS. To implement the procedure GetMUS, we build an 3VJ-QBF formula 
MUSInCell such that each witness of the formula corresponds to a MUS from 
AMU(G,n,o) \ M. The formula consists of several parts and uses several sets of 
variables that are described in the following. 

The main part of the formula, shown in Eq. (3), introduces the first existential 
quantifier and a set P = {pi,...,pjqj} of variables that are quantified by the 
quantifier. Note that each valuation J of P corresponds to a subset S of G; in 
particular let us by Ip, ga denote the set { f; € G|I(p;) = 1}. The formula is build 
in such a way that a valuation J is a witness of the formula if and only if [pg 
is a MUS from AMU;G,h,a} \ M. This property is expressed via three conjuncts, 
denoted inCell(P), unexplored(P), and isMUS(P), encoding that (i) Ipg is 
in the cell a, (ii) [pg is not in M, and (iii) Ip¢ is a MUS, respectively. 


MUSInCell = JP. inCel1(P) A unexplored(P) A isMUS(P) (3) 

Recall that the family Hzo,(n,m) of hash functions is defined as {A | h(y)[i] = 

aio © (i1 Ain A y[k]) for all 1 < i < m}, where aix € {0,1} (Sect. 2). A hash 

function h € Hzor(n,m) is given by fixing the values of individual a; and a cell 

a of h is a bit-vector from {0,1}™”. The formula inCell(P) encoding that the 
set Ip is in the cell a of h is shown in Eq. (4). 


m 


inCell(P) = \(aio@( @ pevai) (4) 


i=1 pE{pr]ai k =1} 
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To encode that we are not interested in MUSes from M, we can simply 
block all the valuations of P that correspond to these MUSes. However, we can 
do better. In particular, recall that if M is a MUS, then no proper subset and 
no proper superset of M can be a MUS; thus, we prune away all these sets from 
the search space. The corresponding formula is shown in Eq. (5). 


unexplored(P) = VAN (( V api) A ( VV pi)) (5) 
MEM fiıEM fig M 
The formula isMUS(P) encoding that Ip is a MUS is shown in Eq. (6). 
Recall that Ip, g is a MUS if and only if Ip a is unsatisfiable and for every closest 
subset S of Ipa it holds that S is satisfiable, where closest subset means that 
|IP,a \ S| = 1. We encode these two conditions using two subformulas denoted 
by unsat(P) and noUnsatSubset(P). 


isMUS(P) = unsat(P) ^ noUnsatSubset (P) (6) 


The formula unsat (P), shown in Eq. (7), introduces the set Vars(G) of vari- 
ables that appear in G and states that every valuation of Vars(G) falsifies at 
least one clause contained in Ip G. 


unsat (P) = VVars(G). VV (pi A =f) (7) 
fieG 


The formula noUnsatSubset (P), shown in Eq. (8), introduces another set of 
variables: Q = {q1,---, qj}. Similarly as in the case of P, each valuation I of Q 
corresponds to a subset of G defined as Ig,¢ = {fi € G | I(q) = 1}. The formula 
expresses that for every valuation I of Q it holds that Ig,g is satisfiable or Ig,¢ 
is not a closest subset of Ip G. 


noUnsatSubset (P) = VQ. sat (Q) V asubset (Q, P) (8) 


The requirement that Ig.g is satisfiable is encoded in Eq. (9). Since we are 
already reasoning about the satisfiability of G’s clauses in Eq. (7), we introduce 
here a copy G” of G where each variable x; of G is substituted by its primed copy 
x’. Equation (9) states that there exists a valuation of Vars(G’) that satisfies 
Ioa. 


sat (Q) = IVars(G’). N Ot V fi) (9) 
fiEG’ 

Equation (10) encodes that Ig.g is a closest subset of Ipa. To ensure that 
Ioa is a subset of Ipga, we add the clauses q; — p;i. To ensure the close- 
ness, we use cardinality constraints. In particular, we introduce another set 
R= {r1,... , riaj} of variables and enforce their values via r; > (p; ^=qi). Intu- 
itively, the number of variables from R that are set to 1 equals to |Ip.g \ Ig,cl. 
Finally, we add cardinality constraints, denoted by exactlyOne(R), ensuring 
that exactly one r; is set to 1. 
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subset (Q, P) = 3R. VAN ((qi > pi) A (ri > (pi A ~qi)) A exactlyOne(R) (10) 
piEP 


Note that instead of encoding a closest subset in Eq. 10, we could just encode 
that [g,g is an arbitrary proper subset of Ip G as it would still preserve the mean- 
ing of Eq. 6 that [pg is a MUS. Such an encoding would not require introducing 
the set R of variables and also, at the first glance, would save a use of one exis- 
tential quantifier. The thing is that the whole formula would still be in the form 
of 4V3-QBF due to Eq.9 (which introduces the second existential quantifier). 
The advantage of using a closet subset is that we significantly prune the search 
space of the QBF solver. It is thus matter of contemporary QBF solvers whether 
it is more beneficial to reduce the number of variables (by removing R) or to 
prune the searchspace via R. 

For the sake of lucidity, we have not exploited the knowledge of IMUg (Ig) 
while presenting the above equations. Since we know that every clause f € IMUg 
has to be contained in every MUS of G, we can fix the values of the variables 
{pi | fi € IMUc} to 1. This, in turn, significantly simplifies the equations and 
prunes away exponentially many (w.r.t. |IMUg|) valuations of P, Q, and R, that 
need to be assumed. To solve the final formula, we employ a SV4-QBF solver, 
i.e., a YP oracle. 

Finally, one might wonder why we use our custom solution for identifying 
MUSes in a cell instead of employing one of existing MUS extraction techniques. 
Conventional MUS extraction algorithms cannot be used to identify MUSes that 
are in a cell since the cell is not “continuous” w.r.t. the set containment. In 
particular, assume that we have three sets of clauses, K, L, M, such that K C 
L C M. It can be the case that K and M are in the cell, but L is not in the 
cell. Contemporary MUS extraction techniques require the search space to be 
continuous w.r.t. the set containment and thus cannot be used in our case. 


4.4 Computing UMUr 


We now turn our attention to computing the union UMUp (i.e., G) of all MUSes 
of F. Let us start by describing well-known concepts of autark variables and 
a lean kernel. A set A C Vars(F) of variables is an autark of F iff there exists 
a truth assignment to A such that every clause of F that contains a variable 
from A is satisfied by the assignment [44]. It holds that the union of two autark 
sets is also an autark set, thus there exists a unique largest autark set (see, 
e.g., [31,32]). The lean kernel of F is the set of all clauses that do not contain 
any variable from the largest autark set. It is known that the lean kernel of F 
is an over-approximation of UMUp (see e.g., [31,32]), and there were proposed 
several algorithms, e.g., [33,38], for computing the lean kernel. 


Algorithm. Our approach for computing UMUp consists of two parts. First, we 
compute the lean kernel K of F to get an over-approximation of UMUr, and 
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Algorithm 4: getUMU(F) 

K + the lean kernel of F; M <— {} 

for f€ K\{f¢M|MeM} do 
W + checkNecessity(f, K) 


if W # null then M — MU { a MUS of W} 
else K- K\{f} 


1 
2 
3 
4 
5 
6 return K 


then we gradually refine the over-approximation K until K is exactly the set 
UMUpr. The refinement is done by solving the MUS-membership problem for each 
f € K. To solve the MUS-membership problem efficiently, we reveal a connection 
to necessary clauses, as stated in the following lemma. 


Lemma 2. A clause f € F belongs to UMUp iff there is a subset W of F such 
that W is unsatisfiable and f is necessary for W (i.e., W \ {f} is satisfiable). 


Proof. =: Let f € UMUp and M € AMUp such that f € M. Since M is a MUS 
then M \ {f} is satisfiable; thus f is necessary for M. 

<: If W is a subset of F and f € W a necessary clause for W then f has to 
be contained in every MUS of W. Moreover, W has at least one MUS and since 
W C F, then every MUS of W is also a MUS of F. 


Our approach for computing UMUr is shown in Algorithm 4. It takes as 
an input the formula F and outputs UMUp (denoted K). Moreover, the algo- 
rithm maintains a set M of MUSes of F. Initially, M = 9 and K is set to the 
lean kernel of F; we use an approach by Marques-Silva et al. [38] to compute the 
lean kernel. At this point, we know that K D UMUp D {f € M |M € M}. To find 
UMUp, the algorithm iteratively determines for each f € K\{f E M|M eM} 
if f € UMUp. In particular, for each f, the algorithm checks whether there exists 
a subset W of K such that f is necessary for W (Lemma 2). The task of finding 
W is carried out by a procedure checkNecessity(f, K). If there is no such W, 
then the algorithm removes f from K. In the other case, if W exists, the algo- 
rithm finds a MUS of W and adds the MUS to the set M. Any available single 
MUS extraction approach, e.g., [2,5,7,46], can be used to find the MUS. 

To implement the procedure checkNecessity(f, K) we build a QBF formula 
that is true iff there exists a set W C K such that W is unsatisfiable and f is 
necessary for W. To represent W we introduce a set S = {s4 |g E K} of Boolean 
variables; each valuation I of S corresponds to a subset Is x of K defined as 
Is,x = {g € K|I(sg) = 1}. Our encoding is shown in Eq. 11. 


AS, Vars(K).VVars(K’).sf \( N (gV789)) ACV (Fa A 89)) 0D 
g€K\{F} geK! 


The formula consists of three main conjuncts. The first conjunct ensures that 
f is present in Is g. The second conjunct states that Is, \ {f} is satisfiable, 
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i.e., that there exists a valuation of Vars(K) that satisfies Is, x \ {f}. Finally, 
the last conjunct express that Is x is unsatisfiable, i.e., that every valuation of 
Vars (Ix) falsifies at least one clause of Ig g. Since we are already reasoning about 
variables of K in the second conjunct, in the third conjunct, we use a primed 
version (a copy) K’ of K. 


Alternative QBF Encodings. Janota and Marques-Silva [30] proposed three 
other QBF encodings for the MUS-membership problem, i.e., for deciding 
whether a given f € F belongs to UMUr. Two of the three proposed encod- 
ings are typically inefficient; thus, we focus on the third encoding, which is the 
most concise among the three. The encoding, referred to as JM encoding (after 
the initials of the authors), uses only two quantifiers in the form of JV-QBF 
and it is only linear in size w.r.t. |F|. The underlying ideas by JM encoding 
and our encoding differ significantly. Our encoding is based on necessary clauses 
(Lemma 2), whereas JM exploits a connection to so-called Maximal Satisfiable 
Subsets. Both the encodings use the same quantifiers; however, our encoding is 
smaller. In particular, the JM uses 2 x ( Vars(F) + |F|) variables whereas our 
encoding uses only |F| + 2 x Vars(F) variables, and leads to smaller formulas. 


Implementation. Recall that we compute UMUr to reduce the search space, 
i.e. instead of working with the whole F, we work only with G = UMUp. The 
soundness of this reduction is witnessed in Lemma 1 (Sect. 4.2). In fact, Lemma 1 
shows that it is sound to reduce the search space to any G” such that UMUp C 
G’ C F. Since our algorithm for computing UMUr subsumes repeatedly solving 
a ©$-complete problem, it can be very time-consuming. Therefore, instead of 
computing the exact UMUp, we optionally compute only an over-approximation 
G’ of UMUr. In particular, we set a (user-defined) time limit for computing the 
lean kernel K of F. Moreover, we use a time limit for executing the procedure 
checkNecessity(f, K); if the time limit is exceeded for a clause f € K, we 
conservatively assume that f € UMUp, i.e., we over-approximate. 


Sparse Hashing and UMUp. The approach of computation of UMUp is similar to, 
in spirit, computation of independent support of a formula to design sparse hash 
functions [16,28]. Briefly, given a Boolean formula H, an independent support of 
H is a set Z C Vars(H) such that in every model of H, the truth assignment to 
T uniquely determines the truth assignment to Vars(H) \ Z. Practically, inde- 
pendent support can be used to reduce the search space where a model counting 
algorithm searches for models of H. It is interesting to note that the state of 
the art technique reduces the computation of independent support of a formula 
in the context of model counting to that of computing (Group) Minimal Unsat- 
isfiable Subset (GMUS). Thus, a formal study of computation of independent 
support in the context of MUSes is an interesting direction of future work. 


454 J. Bendik and K. S. Meel 


Algorithm 5: getIMU(G) 


1 CG 
2 K-ģ 
3 while C49 do 
4 f — choose f € C 
5 (sat?, I, core) — checkSAT(G \ {f}) 
6 if sat? then 
7 R<—RMR(G, f, I) 
8 K+ KU{f}UR 
9 C = C \ (Ff}U R) 
10 else 
11 | C — CN core 
12 return K 


4.5 Computing IMUg¢ 


Our approach to compute the intersection IMUg (i.e., Ig) of all MUSes of G is 
composed of several ingredients. First, recall that a clause f € G belongs to IMUg 
iff f is necessary for G. Another ingredient is the ability of contemporary SAT 
solvers to provide either a model or an unsat core of a given unsatisfiable formula 
N CG, i.e., a small, yet not necessarily minimal, unsatisfiable subset of N. The 
final ingredient is a technique called model rotation. The technique was originally 
proposed by Marques-Silva and Lynce [40], and it serves to explore necessary 
clauses based on other already known necessary clauses. In particular, let f be 
a necessary clause for G and I : Vars(G) — {0,1} a model of G \ {f}. Since 
G is unsatisfiable, the model J does not satisfy f. The model rotation attempts 
to alter I by switching, one by one, the Boolean assignment to the variables 
Vars({f}). Each variable assignment J’ that originates from such an alternation 
of I necessarily satisfies f and does not satisfy at least one f’ € G. If it is the 
case that there is exactly one such f’, then f’ is necessary for G. An improved 
version of model rotation, called recursive model rotation, was later proposed 
by Belov and Marques-Silva [6] who noted that the model rotation could be 
recursively performed on the newly identified necessary clauses. 

Our approach for computing IMUg is shown in Algorithm 5. To find IMUg, 
the algorithm decides for each f whether f is necessary for G. In particular, the 
algorithm maintains two sets: a set C of candidates on necessary clauses and 
aset K of already known necessary clauses. Initially, K is empty and C = G. At 
the end of computation, C is empty and K equals to IMUg. The algorithm works 
iteratively. In each iteration, the algorithm picks a clause f € C and checks 
G\ {f} for satisfiability via a procedure checkSAT. Moreover, checkSAT returns 
either a model J or an unsat core core of G\{f}. If G\ {f} is satisfiable, i.e. f is 
necessary for G, the algorithm employs the recursive model rotation, denoted by 
RMR(G, f, I), to identify a set R of additional necessary clauses. Subsequently, 
all the newly identified necessary clauses are added to K and removed from C. 
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In the other case, when G \ {f} is unsatisfiable, the set C is reduced to CN core 
since every necessary clause of G has to be contained in every unsatisfiable subset 
of G. Note that f ¢ core, thus at least one clause is removed from C. 


5 Experimental Evaluation 


We employed several external tools to implement AMUSIC. In particular, we use 
the QBF solver CAQE [49] for solving the QBF formula MUSInCe11, the 2QBF 
solver CADET [50] for solving our SV-QBF encoding while computing UMUp, and 
the QBF preprocessor QRATPre+ [37] for preprocessing/simplifying our QBF 
encodings. Moreover, we employ muser2 [7] for a single MUS extraction while 
computing UMU;, a MaxSAT solver UWrMaxSat [48] to implement the algorithm 
by Marques-Silva et al. [38] for computing the lean kernel of F, and finally, we 
use a toolkit called pysat [27] for encoding cardinality constraints used in the 
formula MUSInCe11. The tool along with all benchmarks that we used is available 
at https://github.com/jar-ben/amusic. 


Objectives. As noted earlier, AMUSIC is the first technique to (approximately) 
count MUSes without explicit enumeration. We demonstrate the efficacy of our 
approach via a comparison with two state of the art techniques for MUS enumer- 
ation: MARCO [35] and MCSMUS [3]. Within a given time limit, a MUS enumer- 
ation algorithm either identifies the whole AMUp, i.e., provides the exact value of 
|AMU |, or identifies just a subset of AMUp, i.e., provides an under-approximation 
of |AMUr| with no approximation guarantees. 

The objective of our empirical evaluation was two-fold: First, we experimen- 
tally examine the scalability of AMUSIC, MARCO, and MCSMUS w.r.t. |AMU,|. 
Second, we examine the empirical accuracy of AMUSIC. 


Benchmarks and Experimental Setup. Given the lack of dedicated counting 
techniques, there is no sufficiently large set of publicly available benchmarks to 
perform critical analysis of counting techniques. To this end, we focused on 
a recently emerging theme of evaluation of SAT-related techniques on scalable 
benchmarks'. In keeping with prior studies employing empirical methodology 
based on scalable benchmarks [22,41], we generated a custom collection of CNF 
benchmarks. The benchmarks mimic requirements on multiprocessing systems. 
Assume that we are given a system with two groups (kinds) of processes, A = 
{a1,--.,@4j} and B = {by,...,b)5)}, such that |A| > |B|. The processes require 
resources of the system; however, the resources are limited. Therefore, there 
are restrictions on which processes can be active simultaneously. In particular, 
we have the following three types of mutually independent restrictions on the 
system: 


1 M. Y. Vardi, in his talk at BIRS CMO 18w5208 workshop, called on the SAT com- 
munity to focus on scalable benchmarks in lieu of competition benchmarks. Also, 
see: https://gitlab.com/satisfiability/scalablesat (Accessed: May 10, 2020). 
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Fig. 3. The number of completed iterations and the accuracy of the final MUS count 
estimate for individual benchmarks. 


— The first type of restriction states that “at most k — 1 processes from the 
group A can be active simultaneously”, where k < | A]. 

— The second type of restriction enforces that “if no process from B is active 
then at most k—1 processes from A can be active, and if at least one process 
from B is active then at most l — 1 processes from A can be active”, where 
k,l < |A]. 

— The third type of restriction includes the second restriction. Moreover, we 
assume that a process from B can activate a process from A. In particular, 
for every b; E€ B, we assume that when b; is active, then a; is also active. 


We encode the three restrictions via three Boolean CNF formulas, R1, R2, R3. 
The formulas use three sets of variables: X = {21,..., xja h Y = {y1,---, yay}, 
and Z. The sets X and Y represent the Boolean information about activity of 
processes from A and B: a, is active iff x; = 1 and b; is active iff y; = 1. The 
set Z contains additional auxiliary variables. Moreover, we introduce a formula 
ACT = (A,,cx Vi) A (Ay,ey Yi) encoding that all processes are active. For each 
i € {1,2,3}, the conjunction G; = R; A ACT is unsatisfiable. Intuitively, every 
MUS of G; represents a minimal subset of processes that need to be active 
to violate the restriction. The number of MUSes in G1, G2, and G3 is (4h), 
(4) + |B] x (4), and (4) + 5l! (|B!) x (/4154)), respectively. We generated 
G1, G2, and G3 for these values: 10 < |A| < 30,2 < |B] <6, [41] < k < (244], 
and l = k — 1. In total, we obtained 1353 benchmarks (formulas) that range in 
their size from 78 to 361 clauses, use from 40 to 152 variables, and contain from 
120 to 1.7 x 10° MUSes. 

All experiments were run using a time limit of 7200s and computed on an 
AMD EPYC 7371 16-Core Processor, 1 TB memory machine running Debian 
Linux 4.19.67-2. The values of € and 6 were set to 0.8 and 0.2, respectively. 
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Fig. 4. Scalability of AMUSIC, MARCO, and MCSMUS w.r.t. |AMUF]. 


Accuracy. Recall that to compute an estimate c of |AMU-|, AMUSIC performs 
multiple iteration of executing AMUSICCore to get a list C of multiple estimates 
of |AMU;|, and then use the median of C as the final estimate c. The more 
iterations are performed, the higher is the confidence that c is within the required 
tolerance € = 0.8, i.e., that al | < ¢<1.8-|AMUp|. To achieve the confidence 
1 — ô = 0.8, 66 iterations need io be performed. In case of 157 benchmarks, the 
algorithm was not able to finish even a single iteration, and only in case of 251 
benchmarks, the algorithm finished all the 66 iterations. For the remaining 945 
benchmarks, at least some iterations were finished, and thus at least an estimate 
with a lower confidence was determined. 

We illustrate the achieved results in Fig. 3. The figure consists of two plots. 
The plot at the bottom of the figure shows the number of finished iterations (y- 
axis) for individual benchmarks (x-axis). The plot at the top of the figure shows 
how accurate were the MUS count estimates. In particular, for each benchmark 
(formula) F, we show the number mup] Where c is the final estimate (median 
of estimates from finished iterations). For benchmarks where all iterations were 
completed, it was always the case that the final estimate is within the required 
tolerance, although we had only 0.8 theoretical confidence that it would be the 
case. Moreover, the achieved estimate never exceeded a tolerance of 0.1, which 
is much better than the required tolerance of 0.8. As for the benchmarks where 
only some iterations were completed, there is only a single benchmark where the 
tolerance of 0.8 was exceeded. 


Scalability. The scalability of AMUSIC, MARCO, and MCSMUS w.r.t. the num- 
ber of MUSes (|AMU;]) is illustrated in Fig. 4. In particular, for each benchmark 
(x-axis), we show in the plot the estimate of the MUS count that was achieved 
by the algorithms (y-axis). The benchmarks are sorted by the exact count of 
MUSes in the benchmarks. MARCO and MCSMUS were able to finish the MUS 
enumeration, and thus to provide the count, only for benchmarks that contained 
at most 10° and 10° MUSes, respectively. AMUSIC, on the other hand, was able 
to provide estimates on the MUS count even for benchmarks that contained up to 
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10° MUSes. Moreover, as we have seen in Fig. 3, the estimates are very accurate. 
Only in the case of 157 benchmarks where AMUSIC finished no iteration, it could 
not provide any estimate. 


6 Summary and Future Work 


We presented a probabilistic algorithm, called AMUSIC, for approximate MUS 
counting that needs to explicitly identify only logarithmically many MUSes and 
yet still provides strong theoretical guarantees. The high-level idea is adopted 
from a model counting algorithm ApproxMC3: we partition the search space into 
small cells, then count MUSes in a single cell, and estimate the total count by 
scaling the count from the cell. The novelty lies in the low-level algorithmic parts 
that are specific for MUSes. Mainly, (1) we propose QBF encoding for counting 
MUSes in a cell, (2) we exploit MUS intersection to speed-up localization of 
MUSes, and (3) we utilize MUS union to reduce the search space significantly. 
Our experimental evaluation showed that the scalability of AMUSIC outperforms 
the scalability of contemporary enumeration-based counters by several orders of 
magnitude. Moreover, the practical accuracy of AMUSIC is significantly better 
than what is guaranteed by the theoretical guarantees. 

Our work opens up several questions at the intersection of theory and prac- 
tice. From a theoretical perspective, the natural question is to ask if we can 
design a scalable algorithm that makes polynomially many calls to an NP ora- 
cle. From a practical perspective, our work showcases interesting applications of 
QBF solvers with native XOR support. Since approximate counting and sam- 
pling are known to be inter-reducible, another line of work would be to investigate 
the development of an almost-uniform sampler for MUSes, which can potentially 
benefit from the framework proposed in UniGen [14,16]. Another line of work is 
to extend our MUS counting approach to other constraint domains where MUSes 
find an application, e.g., F can be a set of SMT [25] or LTL [4,8] formulas or 
a set of transition predicates [13,23]. 
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Abstract. Given a Boolean formula, the problem of counting seeks to 
estimate the number of solutions of F while the problem of uniform 
sampling seeks to sample solutions uniformly at random. Counting and 
uniform sampling are fundamental problems in computer science with a 
wide range of applications ranging from constrained random simulation, 
probabilistic inference to network reliability and beyond. The past few 
years have witnessed the rise of hashing-based approaches that use XOR- 
based hashing and employ SAT solvers to solve the resulting CNF for- 
mulas conjuncted with XOR. constraints. Since over 99% of the runtime 
of hashing-based techniques is spent inside the SAT queries, improving 
CNF-XOR solvers has emerged as a key challenge. 

In this paper, we identify the key performance bottlenecks in the 
recently proposed BIRD architecture, and we focus on overcoming these 
bottlenecks by accelerating the XOR handling within the SAT solver 
and on improving the solver integration through a smarter use of (par- 
tial) solutions. We integrate the resulting system, called BIRD2, with the 
state of the art approximate model counter, ApproxMC3, and the state 
of the art almost-uniform model sampler UniGen2. Through an extensive 
evaluation over a large benchmark set of over 1896 instances, we observe 
that BIRD2 leads to consistent speed up for both counting and sampling, 
and in particular, we solve 77 and 51 more instances for counting and 
sampling respectively. 


1 Introduction 


A CNF-XOR formula y is represented as conjunction of two Boolean formulas 
ponr ^ yxor wherein yon Fr is represented in Conjunctive Normal Form (CNF) 
and Yxor is represented as conjunction of XOR. constraints. While owing to 
the NP-completeness of CNF, every CNF-XOR formula can be represented as 
a CNF formula with only a linear increase in the size of the resulting formula, 
such a transformation may not be ideal in several scenarios. In particular, it is 


The resulting tools ApproxMC4 and UniGen3 are available open source at https:// 
github.com/meelgroup/approxme and https://github.com/meelgroup/unigen. 
© The Author(s) 2020 
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well known that modern Conflict Driven Clause Learning (CDCL) SAT solvers 
perform poorly on XOR formulas represented in CNF form despite the exis- 
tence of efficient polynomial time decision procedures for XOR constraints. Fur- 
thermore, constraints arising from domains such as cryptanalysis and circuits 
can be naturally described as CNF-XOR formulas and these domains served as 
the early inspiration for design of SAT solvers with native support for XORs 
through the usage of Gaussian Elimination. These efforts lead to the develop- 
ment of CryptoMiniSat, a SAT solver that sought to perform Conflict Driven 
Clause Learning and Gaussian Elimination in tandem. The architecture of the 
early verisons of CryptoMiniSat sought to employ disjoint storage of CNF and 
XOR clauses — reminiscent to the architecture of SMT solvers. 

While CryptoMiniSat was originally designed for cryptanalysis, its ability to 
handle XORs natively has led it to be a fundamental building block of the 
hashing-based techniques for approximate model counting and sampling. Model 
counting, also known as #SAT, and uniform sampling of solutions for Boolean 
formulas are two fundamental problems in computer science with a wide variety 
of applications [1,11,18]. The core idea of hashing-based techniques for approx- 
imate counting and almost-uniform sampling is to employ XOR-based 3-wise 
independent hash functions! to partition the solution space of F into roughly 
equal small cells of solutions. The usage of XOR-based hash functions allows us 
to represent a cell as conjunction of a Boolean formula in conjunctive normal 
form (CNF) and XOR constraints, and a SAT solver is invoked to enumerate 
solutions inside a randomly chosen cell. The corresponding counting and sam- 
pling algorithms typically employ the underlying solver in an incremental fashion 
and invoke the solver thousands of times, thereby necessitating the need for run- 
time efficiency. In this context, Soos and Meel [19] observed that the original 
architecture of CryptoMiniSat did not allow a straightforward integration of pre- 
and in-processing which of late has emerged to be key techniques in SAT solving. 
Accordingly, Soos and Meel [19] proposed a new architecture, called BIRD, that 
relied on the key idea of keeping the XOR constraints in both CNF form and 
XOR form. Soos and Meel integrated BIRD into CryptoMiniSat, and showed that 
state of the art approximate model counter, ApproxMC, when integrated with 
the new version of CryptoMiniSat achieves significant runtime improvements. The 
resulting version of ApproxMC was called ApproxMC3. 

Motivated by the success of BIRD in achieving significant runtime perfor- 
mance improvements, we sought to investigate the key bottlenecks in the run- 
time performance of CryptoMiniSat when handling CNF+XOR formulas. Given 
the prominent usage of CNF-XOR formulas by the hashing based techniques, 
we study the runtime behavior of CryptoMiniSat for the the queries issued by 
the hashing-based approximate counters and samplers, ApproxMC3 and UniGen2 
respectively. Our investigation leads us to make five core technical contributions. 
The first four contributions contribute towards architectural advances in han- 


1 While approximate counting techniques [10] only require 2-wise independent hash 
functions, hashing-based sampling techniques [6,9] require 3-wise independent hash 
functions. 
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dling of CNF-XOR formulas while the fifth contribution focuses on algorithmic 
improvements in the hashing-based techniques for counting and sampling: 


1. Matrix row handling improvements for efficient propagation and conflict 
checking of XOR constraints 

2. XOR constraint detaching from the standard unit propagation system for 
higher unit propagation speed 

3. Lazy reason clause generation to reduce reason generation overhead for 
unused reasons generated from XOR constraints 

4. Allowing partial solution extraction by the SAT solver 

5. Intelligent reuse of solutions by hashing-based techniques to reduce the 
number of SAT calls 


We integrate these improvements into the BIRD framework, the resulting 
framework is called BIRD2. The BIRD2 framework is applied to state of the 
art approximate model counter, ApproxMC3, and to the almost-uniform sam- 
pler UniGen2 [6,9]. The resulting counter and sampler are called ApproxMC4 
and UniGen3 respectively. We conducted an extensive empirical evaluation 
with over 1800 benchmarks arising from diverse domains with computational 
effort totalling 50,000 CPU hours. With a timeout of 5000 s, ApproxMC3 
and UniGen2+BIRD were able to solve only 1148 and 1012 benchmarks, while 
ApproxMC4 and UniGen3 solved 1225 and 1063 benchmarks respectively. Further- 
more, we observe a consistent speedup for most of the benchmarks that could 
be solved by ApproxMC3 and UniGen2+BIRD. In particular, the PAR-2? score 
improved from 4146 with ApproxMC3 to 3701 with ApproxMC4. Similarly, the 
corresponding PAR-2 scores for UniGen3 and UniGen2+BIRD improved to 4574 
from 4878. 


2 Notations and Preliminaries 


Let F be a Boolean formula in conjunctive normal form (CNF) and Vars(F) the 
set of variables in F. Unless otherwise stated, we use n to denote the number of 
variables in F i.e., n = |Vars(F')|. An assignment of truth values to the variables 
in Vars(F’) is called a satisfying assignment or witness of F if it makes F eval- 
uate to true. We denote the set of all witnesses of F by sol(F). If we are only 
interested in a subset of variables S C Vars(F’) we will use sol(F’),s5 to indicate 
the projection of sol(F) on S. 

The problem of propositional model counting is to compute |sol(F)| for a 
given CNF formula F. A probably approximately correct (or PAC) counter is a 
probabilistic algorithm ApproxCount(-,-,-) that takes as inputs a formula F, a 
tolerance £ > 0, and a confidence 1— ô € (0, 1], and returns a count c with (£, ô)- 


guarantees, i.e., Pr||sol(F)|/(1 +e)<e< (1+ 2)|sol(F)|| > 1-6. Projected 


? PAR-2 score, that is, penalized average runtime, assigns a runtime of two times the 
time limit (instead of a “not solved” status) for each benchmark not solved by a 
tool. 
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model counting is defined analogously using sol(F’)|5 instead of sol(F), for a 
given sampling set S C Vars( F). 

A uniform sampler outputs a solution y € sol(F’) such that Pr[y is output] = 
[eal (FY An almost-uniform sampler relaxes the guarantee of uniformity and in 


particular, ensures that SFY < Prly is output] < Toit FY 


Universal Hash Functions. Let n,m € N and H(n,m) = {h : {0,1}" — 
{0,1}™} be a family of hash functions mapping {0,1}”" to {0,1}. We use 
h Ë H(n,m) to denote the probability space obtained by choosing a func- 
tion h uniformly at random from H(n,m). To measure the quality of a hash 
function we are interested in the set of elements of S mapped to a by h, denoted 
Cell; sha) and its cardinality, i.e., |Cell;s n,.)|. To avoid cumbersome terminology, 
we abuse notation slightly and we use Cell; r m) (resp. Cnt(pm)) as shorthand for 
Cell, sol(F),h,a) (resp. |Cell(sot(F),h,0) |): 


Definition 1. A family of hash functions H(n,m) is k-wise independent? if 
Var, Q2,...@k E {0,1}™ and for distinct y1, Y2,... Yk E€ {0,1}", h = H(n,m), 


k 
Pr[(h(y1) = a1) A (h(y2) = a2)... A (Ayx) = ax)] = (sn) () 


Note that every k-wise independent hash family is also k—1 wise independent. 


Prefix Slicing. While universal hash families have nice concentration bounds, 
they are not adaptive, in the sense that one cannot build on previous queries. In 
several applications of hashing, the dependence between different queries can be 
exploited to extract improvements in theoretical complexity and runtime perfor- 
mance. Thus, we are typically interested in prefix slices of hash functions [10] as 
follows. 


Definition 2. For every m € {1,...n}, the mt}? prefix-slice of h, denoted h%™, 
is a map from {0,1}" to {0,1}™, such that h°™(y)[i] = h(y)[i], for all y € 
{0,1}" and for alli € {1,...m}. Similarly, the m*” prefix-slice of a, denoted 
a™), is an element of {0,1} such that a™ [i] = afi] for alli € {1,...m}. 


Explicit Hash Functions. The most common explicit hash family used in 
state of the art sampling and counting techniques is based on random XOR 
constraints. Viewing Vars(F’) as a vector æ of dimension n x 1, we can represent 
the hash family as follows: Let Hzor(n, m) = {h : {0,1}" — {0,1}™} be the 
family of functions of the form h(x) = Mz + b with M € Fy’*" and b € F} *! 
where the entries of M and b are independently generated according to the 


3 The phrase strongly 2-universal is also used to refer to 2-wise independent as noted 
by Vadhan in [23], although the concept of 2-universal hashing proposed by Carter 
and Wegman [4] only required that Pr[h(x) = h(y)] < s5- 
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Bernoulli distribution with probability 1/2. Observe that h) (x) can be written 
as h™ (x) = Mae +b'™, where M“”) denotes the submatrix formed by the 
first m rows and n columns of M and b‘”) is the first m entries of the vector 
b. It is well known that Ho; is 3-wise independent [9]. 


3 Background 


The general idea of hashing-based model counting and sampling is to use a hash 
function from a suitable family, e.g. Hzor, to divide the solution space into cells 
that are sufficiently small such that all solutions within a cell can be enumerated 
efficiently. Given such a cell, its size can then be used to estimate the total count 
of solutions or we can return a random element of this small cell to produce a 
sample. Hence, hashing-based sampling and counting are closely related. 


3.1 Hashing-Based Model Counting 


The seminal work of Valiant [24] established that #SAT is #P-complete. 
Toda [22] showed that the entire polynomial hierarchy is contained inside the 
complexity class defined by a polynomial time Turing machine equipped with 
#P oracle. Building on Carter and Wegman’s [4] seminal work of universal hash 
functions, Stockmeyer [21] proposed a probabilistic polynomial time procedure 
relative to an NP oracle to obtain an (¢,0)-approximation of F. 

The core theoretical idea of the hashing-based approximate solution count- 
ing framework proposed in ApproxMC [8], building on Stockmeyer [21], is to 
employ 2-universal hash functions to partition the solution space, denoted by 
sol(F’) for a formula F, into roughly equal small cells, wherein a cell is called 
small if it has solutions less than or equal to a pre-computed threshold, thresh. 
An NP oracle is employed to check if a cell is small by enumerating solutions 
one-by-one until either there are no more solutions or we have already enumer- 
ated thresh + 1 solutions. In practice, a SAT solver is used to realize the NP 
oracle. To ensure polynomially many calls to the oracle, thresh is set to be poly- 
nomial in the input parameter £. To determine the right number of cells, i.e., the 
value of m for H(n,m), a search procedure is invoked. Finally, the subroutine, 
called ApproxMCCore, computes the estimate as the number of solutions in the 
randomly chosen cell scaled by the number of cells (i.e, 2”). To achieve prob- 
abilistic amplification of the confidence, multiple invocations of the underlying 
subroutine, ApproxMCCore, are performed with the final count computed as the 
median of estimates returned by ApproxMCCore. 

Two key algorithmic improvements proposed in ApproxMC2 [10] are signifi- 
cant to practical performance: (1) the search for the right number of cells can be 
performed via galloping search, and (2) one can first perform linear search over a 
small enough interval (chosen to be of size 7) around the value of m found in the 
previous iteration of ApproxMCCore. The practical profiling of ApproxMC2 reveals 
that linear search is sufficient after the first invocation of ApproxMCCore. Note 
that the linear search seeks to identify a value of m such that Cnt, Fym—1) = thresh 
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and Cnti7m) < thresh for an appropriately chosen thresh. ApproxMC is currently 
in its third generation: ApproxMC3. 


3.2 Hashing-Based Sampling 


Jerrum, Valiant, and Vazirani [14] showed that the approximate counting and 
almost-uniform counting are polynomially inter-reducible. Building on Jerrum 
et al.’s result, Bellare, Goldreich, and Petrank [2] proposed a probabilistic uni- 
form generator that makes polynomially many calls to an NP oracle where 
each NP query is the input formula F conjuncted with constraints encoding 
a degree n polynomially representing n-wise independent hash functions where 
n is the number of variables in F. The practical implementation of Bellare 
et al.’s technique did not scale beyond few tens of variables. Chakraborty, Meel, 
and Vardi [7,9], sought to combine the inter-reducibility and the usage of inde- 
pendent hashing, and proposed a hashing-based framework, called UniGen, that 
employs 3-wise independent hashing and makes polynomially many calls to an 
NP oracle. 

The core theoretical idea of the hashing-based sampling framework, proposed 
in UniGen, exploits the close relationship between counting and sampling. UniGen 
first invokes ApproxMC to compute an estimate of the number of solutions of the 
given formula F. It then uses the count to determine the number of cells that the 
solution space should be partitioned into using 3-wise independent hash func- 
tions. At this point, it is worth mentioning that the state of the art hashing-based 
sampling employ 3-wise independent hash functions. Fortunately, the family of 
hash functions, Hzor, is also known to be 3-wise independent. There after, sim- 
ilar to ApproxMC, a linear search over a small enough interval (chosen to be of 
size 4) is invoked to find the right value of m where a randomly chosen cell’s 
size is within the desired bounds. For such a cell, all its solutions are enumer- 
ated and one of the solutions is randomly chosen. Again, similar to ApproxMC2 
(and ApproxMC3), the linear search seeks to identify a value of m such that 
Cnt¢~m—1) = thresh and Cnt;p m) < thresh for an appropriately chosen thresh. 
UniGen is currently in its second generation: UniGen2 [6]. 


3.3 The Underlying SAT Solver 


The underlying SAT solver is invoked through subroutine BoundedSAT, which 
is implemented using CryptoMiniSat. Formally, BoundedSAT takes as inputs a 
formula F, a threshold thresh, and a sampling set S, and returns a subset Y 
of sol(F);s, such that |Y| = min(thresh, |sol(F'),s5|). The formula F consists of 
the original formula, which we want to count or sample, conjuncted with a set 
of XOR constraints defined through a hash function sampled from the family 
Hor. We henceforth denote such formulas as CNF-XOR formulas. Note that 
the efficient encoding of XOR constraints into CNF requires the introduction of 
new variables and hence the sampling set S usually does not contain all variables 
in F. 
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As is consistent with prior studies, profiling of ApproxMC3 and UniGen2 
reveal that over 99% of the time is spent in the runtime of BoundedSAT. 
Therefore recent efforts have focused on improving BoundedSAT. Soos and 
Meel [19] sought to address the performance of the underlying SAT solver by 
proposing a new architecture, called BIRD, that allows the usage of in- and 
pre-processing techniques for a Gauss Jordan Elimination (GJE)-augmented 
SAT solver. ApproxMC2, integrated with BIRD, called ApproxMC3, gave up to 
three orders of magnitude runtime performance improvement. Such significant 
improvements are rare in the SAT community. Encouraged by Soos and Meel’s 
observations, we seek to build on top of BIRD to achieve an even tighter inte- 
gration of the underlying SAT solver and ApproxMC3/UniGen2. 


BIRD: Blast, Inprocess, Recover, and Destroy. Pre- and inprocessing tech- 
niques are known to have a large impact on the runtime performance of SAT 
solvers. However, earlier Guassian elimination architectures were unable to per- 
form these techniques. Motivated by this inability, Soos and Meel [19] proposed 
a new framework, called BIRD, that allows usage of inprocessing techniques for 
GJE-augmented CDCL solvers. The key idea of BIRD is to blast XOR clauses 
into CNF clauses so that any technique working solely on CNF clauses does not 
violate soundness of the solver. To perform Gauss-Jordan elimination, one needs 
efficient algorithms and data structures to extract XORs from CNF. The entire 
framework is presented as follows: 


BIRD: Blast, In-process, Recover, and Destroy 


Step 1 Blast XOR clauses into normal CNF clauses 

Step 2 Inprocess (and pre-process) over CNF clauses 

Step 3 Recover simplified XOR clauses 

Step 4 Perform CDCL on CNF clauses with on-the-fly Gauss-Jordan Elimi- 
nation (GJE) on XOR clauses until inprocessing is scheduled 

Step 5 Destroy XOR clauses and goto Step 2 


The above loop terminates as soon as a satisfying assignment is found or the 
formula is proven UNSAT. The BIRD architecture separates inprocessing from 
CDCL solving and therefore every sound inprocessing step can be employed. 


4 Technical Contributions to CNF-XOR Solving 


Inspired by the success of BIRD, we seek to further improve the underlying SAT 
solver’s architecture based on the queries generated by the hashing-based tech- 
niques. To this end, we relied on extensive profiling of CryptoMiniSat augmented 
with BIRD to identify the key performance bottlenecks, and propose solutions 
to overcome some of the challenges. 
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4.1 Detaching XOR Clauses from Watch-Lists 


Given a formula F in CNF, the recovery phase of BIRD attempts to construct 
a set of XORs, H such that F — H. As detailed in [19], the core tech- 
nique for recovery of an XOR of size k is to establish whether the required 
2*-! combinations of clauses are implied by the existing CNF clauses. For 
example, the XOR zı © £2 ® x3 = 0 (where k = 3) can be recovered 
if the existing set of CNF clauses implies the following 4(= 2371) clauses: 
(xı V £2 V 723) mx (xı V azə V z3) TAN (na V £2 V z3) A (Aa V az V 723). To 
this end, the first stage of the recovery phase of BIRD iterates over the CNF 
clauses and for a given clause, called base_cl of size k, searches whether the 
remaining 2*—! — 1 clauses are implied as well, in which case the resulting XOR 
is added. It is worth noting that a clause can imply multiple clauses over the the 
set of variables of base_cl; For example if the base_cl = (xı V 722 V x3), then the 
clause (2 ,) would imply the two clauses (~z: V 7a V 743) and (721, V T2 V 23). 
Note that given a base_cl, we are only interested in clauses over the variables in 
base_cl. 

During blasting of XORs into CNF, XORs are first cut into smaller XORs 
by introducing auxiliary variables. Hence, the first stage of recovery phase must 
recover these smaller XORs and the second phase reconstructs the larger XORs 
by XOR-ing two XORs together if they differ only on one variable, referred to 
as a clash variable. For example, 71 ® £2 ® x3 = 0 and z3 @ z4 È x5 = 1 can be 
XOR-ed together over clash variable x3 to obtain xı ® £2 @ z4 6 z5 = 1. 

Since BIRD performs CDCL in tandem with Gauss-Jordan elimination, it is 
worth noting that the Gauss-Jordan elimination (GJE)-based decision procedure 
is sound and complete, i.e., all unit propagations and conflicts implied by the 
given set of XORs would be discovered by a GJE-based decision procedure. 
For the initial formula (in CNF) F and the recovered set of XORs, H, if a 
set of CNF clauses G is implied by H, then presence or absence of G does 
not affect soundness and completeness of GJE-augmented CDCL engine. Our 
extensive profiling of the BIRD framework integrated in CryptoMiniSat revealed 
a significant time spent in examination of clauses in G during unit propagation. 
To this end, we sought to ask how to design an efficient technique to find all the 
CNF clauses implied by the recovered XORs. These clauses could be detached 
from unit propagation without any negative effect on correctness of execution. 

A straightforward approach would be to mark all the clauses during the 
blasting phase of XORs into CNF. However, the incompleteness of the recovery 
phase of BIRD does not guarantee that all such marked clauses are indeed implied 
by the recovered set of XORs. Another challenge in the search for detachable 
clauses arises due to construction of larger XORs by combining smaller XORs. 
For example, while 21 ® £2 ® a3 = 0 and z3 ® z4 O45 = 1 imply (z1 V £2 V 723) 
and (z3 V £4 V £5), the combined XOR. zı © £2 ® z4 ® z5 = 1 does not imply 
(xı V £2 V 723) and (x3 V z4 V z5). 

Two core insights inform our design of the modification of the recovery phase 
and search for detachable clauses. Firstly, given a base clause base_cl, if a clause 
cl participates in the recovery of XORs over the variables in base-cl, then cl is 
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implied by the recovered XOR if the number of variables in cl is the same as 
that of base-cl. We call such a clause cl a fully participating clause. Secondly, let 
G and Gə be the set of CNF clauses implied by two XORs qı and qz that share 
exactly one variable, say z;. Let U = (Vars(q1) UVars(q2))\x;. Let q3 be the XOR 
obtained by XORing together qı and q2, then, sol(qs);u C sol(Gi A G2),u if zi 
does not appear in the remaining clauses, i.e., z; ¢ Var [F \ (Gi U G2). 

The above two insights lead us to design a modified recovery and detachment 
phase as follows. During recovery, we add every fully participating clause to the 
set of detachable clauses D. Let U = S U (Vars(D) N Vars(F'\.D)). Then, the 
recovery of longer XORs is only performed over clash variables that do not 
belong to U. We then detach the clauses in D from watch-lists during GJE- 
augmented CDCL phase, mark the clash variables as non-decision variables, 
perform CDCL, and only reattach the clauses and re-set the clash variables to 
be decision variables after the Destroy phase of BIRD. 

If the formula is satisfiable, the design of the solver is such that the solution 
is always found during the GJE-augmented CDCL solving phase. Since clauses 
in D are detached and the clash variables are set to be not decided on during 
this phase, the clash variables are always left unassigned. As discussed below, 
however, we only need to extract solutions over the sampling set S, therefore 
the solution found is adequate as-is, without the clash variables, which are by 
definition not over S as they are only introduced for having short encodings of 
XORs into CNF. 

Conceptually, this approach reconciles the overhead introduced by BIRD, i.e., 
that XOR constraints are also present as regular clauses, with the neatness of 
the original CryptoMiniSat that stored XOR and regular constraints in different 
data structures. This reconciliation takes the best of both worlds. 


4.2 Fast Propagation/Conflict Detection and Reason Generation 


We identified two key bottlenecks in the the current GJE component of BIRD 
framework integrated in CryptoMiniSat, which we sought to improve upon. To 
put our contributions in the context, we first describe the technical details of 
the core data structures and algorithms. 


Han-Jiang’s GJE. To perform Gaussian elimination on a set of XORs, the 
XORs are represented as a matrix where each row represents an XOR. and each 
column represents a variable. The framework proposed by Soos et al. updates 
the matrix whenever a variable is assigned and removes the assigned variable 
from all XORs by zeroing out the corresponding column. However, using the 
matrix in such a way involves significant memory copying during backtracking 
due to having to revert the matrix to a previous version. 

To avoid the overhead, Han and Jiang proposed a new framework [13] build- 
ing on Simplex-like techniques that performs Gauss-Jordan elimination, i.e., 
using reduced row echelon form instead of row echelon form. The key data struc- 
ture innovation was to employ a two-watched variable scheme for each row of the 
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matrix wherein the watched variables are called basic and non-basic variables. 
Essentially, the basic variables are the variables on the diagonal of a matrix in 
reduced row echelon form and hence every row has exactly one basic variable 
and the basic variable only occurs in one row. Similar to standard CDCL solv- 
ing, when a matrix row’s watch is assigned, the GJE component must determine 
whether the row (1) propagates, (2) needs to assign a new watch, (3) is satisfied, 
or (4) is conflicted. It is worth recalling that a row would propagate if all except 
one variable has been assigned and would conflict or be satisfied if all the vari- 
ables in a row have been assigned. Furthermore, we need to find a new watch if 
a watched variable was assigned and there is more than one unassigned variable 
left. If a basic variable is replaced by a new watch then the two corresponding 
columns are swapped and the reduced row echelon form is recomputed. In prac- 
tice swapping columns is avoided by keeping track of which column is a basic 
variable. 

For propagation, checking for conflict, and conflict clause generation Han- 
Jiang proposed a sequential walk through a row that eagerly computes the reason 
clause and stops when it encounters a new watch variable or reaches until the 
end of the row. At that point, the system (1) knows whether the row is satisfied, 
propagating, or conflicted, and (2) if not satisfied, has eagerly computed the 
reason clause for the propagation or the conflict. 

For general benchmarks where XOR constraints do not play an influential 
role in determining satisfiability of the underlying problem, the GJE component 
can be as small as 10% of the entire solving time. However, for formulas generated 
generated by hashing-based techniques, our profiling demonstrated several cases 
where the Gaussian elimination component could be very time consuming, taking 
up to 90% of solving time. 

While the choice of GJE combined with clever data structure maintenance led 
to significant improvements of the runtime of Gaussian Elimination component, 
our profiling identified two processes as key bottlenecks: propagation checking 
and reason generation. We next discuss our proposed algorithmic improvements 
that achieve significant runtime improvement by addressing these bottlenecks. 


Tinted Fast Unit Propagation. The core idea to achieve faster propagation 
is based on bit-level parallelism via the different native operations supported 
by modern CPUs. In particular, modern CPUs provide native support for basic 
bitwise operations on bit fields such as AND, INVERT, hamming weight com- 
putation (i.e., the number of non-zero entries), and find first set (i.e., finding 
the index of first non-zero bit). Given the widespread support of SIMD exten- 
sions, the above operations can be performed at the rate of 128...512 bits per 
instruction. Therefore, the core data structure represents every 0-1 vector as a 
bit field. 

A set of XORs over n variables zx1,...,£n is represented as Ma = b fora 
0-1 matrix M of size m x n, 0-1 vector b of length m and æ = (2,...,%n)". 
Consider the i—th row of M, denoted by M[i]. Let a be a 0-1 vector of size 
n such that a[j|=1 if the variable x; is assigned True or False, and 0 in case 
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x; is unassigned. Let v be a 0-1 vector of size n such that v[j] = 1 if a; is 
set to True and 0 otherwise. Let Z be the bitwise inverse of a 0-1 vector z 
and & be the bitwise AND operation. Let Wunass = hamming_weight(@&_M [i]) 
the number of unassigned variables in the XOR represented by row 7, and 
Woat = hamming _weight(v&M/i]) the number of satisfied variables. We view 
the computation of Wunass and Wyar as viewing the world of M through the 
tinted lens of v and @. Now, the following holds: 


1. Row å is satisfied if and only if Wunass = 0 and (Wya; mod 2) @ bfi] = 0. 

2. Row i causes a conflict if and only if Wunass = 0 and (Way mod 2)@b{i] = 1. 

3. Row i propagates if and only if Wunass = 1. Propagated variable is the one 
that corresponds to the column with the only bit set in @&M[i]. The value 
propagated is (Wa mod 2) @ bfi]. 

4. A new watch needs to be found for row i if and only if Wunass > 2. The new 
watch is any one of the variables corresponding to columns with the bits set 
to 1 in @& M [i], except for the already existing watch variable. 


Reason Generation. For propagation and conflict we generate the reason clauses 
for row i as follows. We forward-scan M [i] for all set bits and insert the corre- 
sponding variable into the reason clause as a literal that evaluates to false under 
the current assignment. In the case of propagation, the literal added for the 
propagated variable, say xj, is added as literal =x; if (Wa: mod 2) & bfi] = 0 
and x; otherwise. 


Example. For example, let b[i] = 1 and Mi] = 10011 corresponding to vari- 
ables 41, %2,...%5 and assignments 1711? respectively, where “?” indicates an 
unassigned variable. Then a = 10110,@&M/[i] = 00001,Wunass = lov = 
10110, v& M [i] = 10010, Wyar = 2 and (Wyq: mod 2) © bfi] = 1. Therefore, this 
row propagates (case 3 above), and the reason generated is (7%, V 7%4 V z5). If 
the assignements were 11110, then Wynass = 0 and (Wyg; mod 2) @ bfi] = 1 so 
this row conflicts (case 2 above), with conflict clause (42%, V 744 V z5). 


Performance. Notice that all cases only require bitwise and, inverse, hamming 
weight and find first set operations. To find a new watch in case 4 we first find the 
first bit that is set to 1 in a& M by invoking find first set. In case the obtained 
index is the same as the existing watch variable, we remove the first 1-bit by 
left shifting and run find first set again to find the second 1-bit. Bitwise and and 
inverse are trivially single-assembly instructions. We use compiler intrinsics to 
execute find first set and hamming weight functions, which compile down to BSF 
and POPCNT in x86 assembly, respectively. It is worth pointing out that we 
keep the bit field representations of a and v synchronized when variables are 
assigned. During backtracking we reset these to zero and refill them as needed. 
For better cache efficiency, we use sequential set of bit-packed 64-bit integers to 
represent all bit-fields, rows, and matrices. 

Although bit-packing is not a novel concept in the context of CNF-XOR solv- 
ing, let us elaborate why we believe that our contribution is conceptually inter- 
esting. Soos et al. [20] used bit-packed pre- and post-evaluated matrices. Since 
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post-evaluated matrices lose information, they have to be saved and reloaded 
on backtracking. Han and Jiang’s code [13] changed this to using pre-evaluated 
matrices only, which free the system from having to save and reload. But it 
was slow, because bit-by-bit evaluation had to happen on every matrix row read 
(thanks to the missing post-evaluation matrix). Our improved approach is essen- 
tially merging the best of both worlds: fast evaluation, without having to save 
and reload. 


4.3 Lazy Reason Clause Generation 


As discussed earlier, the current BIRD performs eager reason clause generation 
in a spirit similar to the original proposal by Han and Jiang. At the time of 
proposal of eager clause generation by Han and Jiang, the state of the art SAT 
solver at that time could solve problems with XOR clauses of sizes in few tens 
to few hundreds. The improved scalability, however, highlights the overhead due 
to eager reason clause generation. During our profiling, we observed that for 
several problems, the independent support of the underlying formula ranges in 
thousands, and therefore, leading to generation of reason clauses involving thou- 
sands of variables. The generation of such long reason clauses is time consum- 
ing and tedious. Furthermore, a significant fraction of reason clauses are never 
required during conflict analysis phase as we are, often, focused only on finding 
a 1UIP clause. Therefore, we seek to explore lazy reason clause generation. 

Let the state of a clause c indicate whether c is satisfied, conflicted or unde- 
termined (i.e., the clause is neither satisfied nor conflicted). The core design of 
our lazy generation technique is based on the following invariant satisfied by 
CDCL-based techniques: Once a (CNF/XOR) clause is satisfied or conflicted, 
the assignment to the variables in the clause does not change as long the state of 
the clause does not change. Observe that when a clause propagates, the propa- 
gated literal changes the state of the clause to satisfied. Furthermore, as long as 
all variables are assigned, the row will not participate in GJE because none of 
the contained variables can become a basic watch. Therefore, whenever an XOR 
clause propagates, we keep an index of the row and the propagating literal but 
do not compute the reason clause. Now, whenever a reason clause is requested, 
we compute the reason clause as detailed above and return a pointer to the 
computed reason clause, and index the computed clause by the corresponding 
row. To ensure correctness, whenever a row causes a propagation, we delete the 
existing reason clause but we do not eagerly compute the new corresponding 
reason clause. On the other hand, if a row is conflicting, the conflict analysis 
requires the reason clause immediately and as such the reason clause is eagerly 
computed. 

Lazy reason clause generation allows us to skip the majority of reason clauses 
to be generated. Furthermore, given that a row cannot lead to more than one 
reason clause, it allows us to statically allocate memory for them. This is in 
stark contrast to the original implementation that not only eagerly computed 
all reason clauses, but also dynamically allocated memory for them, freeing the 
memory up during backtracking. 
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4.4 Skipping Solution Extension of Eliminated Variables 


SAT solvers aim to present a clean and uncomplicated API interface with inter- 
nal behavior typically hidden to enable fast pacing development of heuristics 
without necessitating change in the interface for the end users. While such a 
design philosophy allows easier integration, it may be an hindrance to achiev- 
ing efficiency for the use cases that may not be seeking a simple off-the-shelf 
behavior. Given the surge of projected counting and sampling as the desired 
formulation, BoundedSAT is invoked with a sampling set and we are interested 
only in the assignment to variables in the sampling set. A naive solution would 
be to obtain a complete assignment over the entire set of variables and then 
extract an assignment over the desired sampling set. In this context, we wonder 
if we can terminate early after the variables in the sampling set are assigned. In 
modern SAT solvers, once the solver has determined that the formula is satis- 
fied, the solution extension subroutine is invoked that extends the current partial 
assignment to a complete assignment. Upon profiling, we observed that, during 
solution extension, a significant time is spent in computing an assignment to the 
variables eliminated due to Bounded Variable Elimination (BVE) [12] during 
pre- and inprocessing. When a solution is found, the eliminated clauses must be 
re-examined in reverse, linear, order to make sure the eliminated variables in the 
model are correctly assigned. This examination process can be time-consuming 
on large instances with large portions of the CNF eliminated. 

BVE is widely used in modern SAT solvers owing to its ability to elimi- 
nate a large subset of the input formula and thereby allowing compact data 
structures. While disabling BVE would eliminate the overhead during solution 
extension phase, it would also significantly degrade performance during solving 
phase. Since we are interested in solutions only over the sampling set, we disable 
the invocation of bounded variable elimination for variables in the sampling set. 
Therefore, whenever the SAT solver determines that the current partial assign- 
ment satisfies the formula, all the variables in the sampling set are assigned and 
we do not invoke solution extension. The disabling of solution extension can save 
significant (over 20%) time on certain instances. 


4.5 Putting It All Together: BIRD2 


We combine improvements proposed above into our new framework, called 
BIRD2, a namesake to capture the primary architecture of Blast, In-process, 
Recover, Detach, and Destroy. For completeness, we present the core skeleton of 
BIRD2 in Algorithm 1. BIRD2 terminates as soon as a satisfying assignment is 
found or the formula is proven UNSAT. Similar to BIRD, BIRD2 architecture sep- 
arates inprocessing from CDCL solving and therefore every sound inprocessing 
step can be employed. 
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Algorithm 1. BIRD2(y) > y has a miz of CNF and XOR clauses 


: Blast XOR clauses into normal CNF clauses 

: In-process (and pre-process) over CNF clauses 

: Recover XOR clauses 

: Detach CNF clauses implied by recovered XOR clauses 

: Perform CDCL on CNF clauses with on-the-fly improved GJE on XOR clauses 
until: (a) in-processing is scheduled, (b) a satisfying assignment is found, or (c) 
formula is found to be unsatisfiable 

6: Destroy XOR clauses and reattach detached CNF clauses. Goto line 2 if conditions 

(b) or (c) above don’t hold. Otherwise, return satisfying assignment or report 

unsatisfiable. 


ae w Ne 


5 Technical Contribution to Counting and Sampling 


In this section, we discuss our primary technical contribution to hashing-based 
sampling and counting techniques. 


5.1 Reuse of Previously Found Solutions 


The usage of a prefix-slicing ensures monotonicity of the random variable, 
Cntr), since from the definition of prefix-slicing, we have that for all i, 
AMD (2) = aD — h(x) = a, Formally, 

Proposition 1. For all 1 <i < m, Cellipiziy G Cell, Fi) 


Furthermore as is evident from the analysis of ApproxMC3 [10], the pairwise 
E[Cnty F i 
obtain the set of solutions from invocation of BoundedSAT for FA (h')~1(0) (i.e., 
after putting i XORs), we can potentially reuse the returned solutions when we 
are interested in enumerating solutions for F A (h?)~'(0). In particular, note 
that if i > j, then Proposition 1 implies that all the solutions F A^ (h‘)~1(0) 
are indeed solutions for F A (h?)~1(0) and we can invoke BoundedSAT with 
adjusted threshold. On the other hand, for i < 7, we can check if the solutions 
of F A (h*)~1(0) also satisfy F A (h’t")—1(0). 

On closer observation, we find that the latter case may not be always helpful 
when 7 and j differ by more than a small constant since the ratio of their expected 
number of solutions decreases exponentially with j— i. Interestingly, as discussed 
in Sect.3, both ApproxMC3 and UniGen2 employ linear search over intervals of 
sizes 4 to 7. for the right values of m. In particular, for both ApproxMC3 and 
UniGen2, the linear search seeks to identify a value of m* such that Cnt(pm+—1) = 
thresh and Cntr m+) < thresh for an appropriately chosen thresh. Therefore, 
when invoking BoundedSAT for i = k after determining that for i = k +1, 
Cntr k+1) < thresh, we can replace thresh with thresh — Cntip,41). Similarly, 
when invoking BoundedSAT for i = k after determining that for i = k — 1, 
Cntr k—1) > thresh, we first check how many solutions of F\(h*~')~1(0) satisfy 
F ^ (h*)-1(0). As noted above, in expectation, thresh/2 out of thresh solutions 
of F A (h¥-1)-1(0) would satisfy F A (h*)—1(0). 


independence of the family Hyor implies = 2/-*, Therefore, once we 
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5.2 ApproxMC4 and UniGen3 


That said, we turn our focus to hashing-based sampling and counting techniques 
to showcase the impact of BIRD2. To this end, we integrate BIRD2 along with the 
proposed technique in Sect. 5.1 into the state of the art hashing-based counting 
and sampling tools: ApproxMC3 and UniGen2 respectively. We call our improved 
counting tool ApproxMC4 and our improved sampling tool UniGen3. 


Assurance of Correctness. We believe it to be imperative to strongly verify 
correctness and quality of results provided by our tools, as it is not only pos- 
sible but indeed easy to accidentally generate incorrect or low quality results, 
as demonstrated by Chakraborty and Meel [5]. To ensure the quality and cor- 
rectness of our sampler and counter, we used three methods: (1) fuzzed the 
system as first demonstrated in SAT by Brummayer et al. [3], (2) compared 
the approximate counts returned by ApproxMC4 with the counts computed by 
a known good exact model counter as previously performed by Soos and Meel 
[19], and (3) compared the distribution of samples generated by UniGen4 on an 
example problem against that of a known good uniform sampler as previously 
performed by Chakraborty et al. [9]. We focus on (1), i.e. fuzzing, here and defer 
the discussion about (2) and (3) to the next section. 

Fuzzing is a technique [17] used to find bugs in code by generating random 
inputs and observing crashes, invariant check fails, and other errors from the 
output of the system under test. CryptoMiniSat has such a built-in fuzzer gen- 
erating random CNFs and verifying the output of the solver. To account for 
XOR constraints, we improved the built-in fuzzer of CryptoMiniSat by adding 
a counting- and sampling-specific XOR-CNF generator. This inserts randomly 
generated XORs that form distinct matrices inside the generated CNFs and adds 
a randomly generated sampling set over some of these matrices. We also added 
hundreds of lines of invariant checks to our improved Gauss-Jordan elimination 
algorithm, running throughout our fuzzing tests. Running this improved fuzzer 
for many hundreds of CPU hours has greatly helped debugging and gaining 
confidence in our implementation. 


6 Evaluation 


To evaluate the performance and quality of approximations and samples com- 
puted by ApproxMC4 and UniGen3, we conducted a comprehensive study involv- 
ing 1896 benchmarks as released by Soos and Meel [16] comprising a wide range 
of application areas including probabilistic reasoning, plan recognition, DQMR 
networks, ISCAS89 combinatorial circuits, quantified information flow, program 
synthesis, functional synthesis, logistics, and the like. 

In the context of counting, we focused on a comparison of the performance of 
ApproxMC4 vis-a-vis ApproxMC3. In the context of sampling, a simple method- 
ology would have been a comparison of UniGen3 vis-a-vis the state of the art 
sampler, UniGen2. Such a comparison, in our view, would be unfair to UniGen2 
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as while ApproxMC3 builds on BIRD framework, such is not the case for UniGen2. 
It is worth noting that the BIRD framework, proposed by Soos and Meel [19], can 
work as a drop-in replacement for the SAT solver in UniGen2, as it only changes 
the underlying SAT solver. Therefore, we used UniGen2 augmented with BIRD, 
called UniGen2+BIRD henceforth, as baseline for performance comparisons in 
the rest of this paper, as it is significantly faster than UniGen2, and therefore, 
will lead to a fair comparison and showcase improvements solely due to BIRD2. 

To keep in line with prior studies, we set € = 0.8 and 6 = 0.8 for ApproxMC3 
and ApproxMC4 respectively. Similarly, we set € = 16 for both UniGen3 and 
UniGen2+BIRD respectively. The experiments were conducted on a high perfor- 
mance computer cluster, each node consisting of 2xE5-2690v3 CPUs with 2 x 12 
real cores and 96 GB of RAM. We use a timeout of 5000 s for each experiment, 
which consisted of running a tool on a particular benchmark. 


6.1 Performance 


ApproxMC4 - Time (s) 


1 10 100 1000 5000 
ApproxMC3 - Time (s) 


Fig. 1. Comparison of ApproxMC4 and ApproxMC3. ApproxMC4 is faster below the 
diagonal. Time outs are plotted behind the 5000s mark. 


ApproxMC4 vis-a-vis ApproxMC3. Figure 1 shows a scatter plot comparing 
ApproxMC4 and ApproxMC3. Although, there are some benchmarks that are 
solved faster with ApproxMC3 there is a clear trend demonstrating the speed 
up achieved through our improvements: ApproxMC4 can solve many benchmarks 
more than 10 times faster and in total solves 77 more instances than ApproxMC3. 
In particular, ApproxMC3 and ApproxMC4 solved 1148 and 1225 instances respec- 
tively, while achieving PAR-2 scores of 4146 and 3701 respectively. 
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Fig. 2. Cactus plot showing behavior of ApproxMC4 and ApproxMC3 


Figure 2 shows the cactus plot for ApproxMC3 and ApproxMC4. We present 
the number of benchmarks on the x-axis and the time taken on the y-axis. A 
point (x,y) implies that 2 benchmarks took less than or equal to y seconds to 
solve for the corresponding tool. 

To present a detailed picture of performance gain achieved by ApproxMC4 
over ApproxMC3, we present a runtime comparison of ApproxMC4 vis-a-vis 
ApproxMC3 in Tablel on a subset of benchmarks. Column 1 of the table 
presents benchmarks names, while columns 2 and 3 list the number of vari- 
ables and clauses. Column 4 and 5 list the runtime (in seconds) of ApproxMC4 
and ApproxMC3, respectively. 

While investigating the large improvements in performance, we observed that 
when both the sampling set and the number of solutions is large for a problem, 
the new system can be up to an order of magnitude faster. In these cases the 
Gauss-Jordan elimination (GJE) component of the SAT solver dominated the 
runtime of ApproxMC3 due to the large matrices involved in such problems. The 
improvements of BIRD2 has led to significant improvement in efficiency of GJE 
component and we observe that the runtime, in such instance, is now often 
dominated by the CDCL solver’s propagation and conflict clause generation 
routines. 


UniGen3 vis-a-vis UniGen2+BIRD. Similar to Fig.2, Fig.3 shows the cac- 
tus plot for UniGen3, UniGen2+BIRD, and UniGen2. We present the number 
of benchmarks on the x-axis and the time taken on the y-axis. UniGen3 and 
UniGen2+BIRD were able to solve 1012 and 1063 instances, respectively while 
achieving PAR-2 scores of 4574 and 4878, respectively. UniGen2 could solve only 
360 benchmarks, thereby justifying our choice of implementing UniGen2+BIRD 
as a baseline for fair comparison to showcase strengths of BIRD2. We would like 
to highlight that the cactus plot shows that given a 2600 s timeout, UniGen can 
sample as many benchmarks as UniGen2+BIRD would do for a 5000s timeout. 
To present a clear picture of performance gain by UniGen3 over 
UniGen2+BIRD, we present runtime comparison for UniGen3 vis-a-vis 
UniGen2+BIRD in Tablel, where in addition to data on ApproxMC3 and 
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Table 1. 


Performance comparison of ApproxMC3 vis-a-vis 


ApproxMC4 and 


UniGen2+BIRD vis-a-vis UniGen3. TO indicates timeout after 5000 s or out of memory. 
Notice that on many problems that used to time out even for counting, we can now 


confidently sample. 


Benchmark Vars 
or-70-5-1-UC-20 140 
prod-4 7497 
min-8 1545 
parity.sk_11_11 13116 
leader_sync4_11 205198 
blasted_TR_b12_2. 2426 
hash-8-6 377545 
s15850a_15_7 10995 
ConcreteRole 395951 
tire-3 577 
04B-2 19510 
blasted_case138 849 
hash-11-4 518449 
karatsuba.sk_7_41 19594 
log-3 1413 
modexp8-8-6 167793 
or-100-5-6-UC-20 200 
prod-28 52233 
838417_15_7 25615 
signedAvg 30335 


Cls 


350 
37358 
4230 
47506 
129149 
8373 
1517574 
24836 
1520924 
2004 
86961 
2253 
2082039 
82417 
29487 
633614 
500 
261422 
57946 
91854 


ApproxMC3 ApproxMC4 UniGen2+BIRD 


time (s) 


6.03 
56.65 
152.53 
389.26 
346.4 
308.08 
462.28 
1206.17 
1694.19 
3059.19 
1860.97 
TO 
4602.95 
3192.85 
TO 
4439.21 
TO 

TO 

TO 

TO 


time (s) 500 samples 


2.07 
7.09 
5.58 
436.32 
20.55 
20.46 
266.59 
31.69 
309.07 
233.28 
625.81 
3691.9 
4043.4 
3410.36 
123.15 
TO 
1689.47 
235.02 
187.71 
114.15 


UniGen3 

500 samples 

time (s) 

14.21 6.08 
171.57 36.54 
471.47 35.04 
705.85 809 
1019.09 106.93 
1218.01 546.62 
1321.91 633.84 
2782.96 230.17 
3083.99 923.69 
3876.03 797.42 
TO 2236.31 
TO TO 
TO TO 
TO TO 
TO 408.25 
TO TO 
TO 4898.43 
TO 1053.9 
TO TO 
TO 582.01 


ApproxMC4, columns 5 and 6 lists the runtime for UniGen3 and UniGen2+BIRD 
respectively. Similar to the observation above, we note that UniGen3 is able to 
sample for instances that timed out even for ApproxMC3. It is worth to recall 
that UniGen3 (and UniGen2) first makes a call to an approximate counter during 
its parameter search phase. 


Remark 1. Since the runtime improvements of ApproxMC4 and UniGen3 are pri- 
marily due to improvements in the underlying SAT solver, it is worth pointing 
out, to put our contribution in context, that the difference between average 
PAR-2 scores of the top two solvers in a SAT competition is usually less than 


100. 
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6.2 Quality and Correctness 


Quality of Counting. To evaluate the quality of approximation we follow 
the same approach as Soos and Meel [19] and compare the approximate counts 
returned by ApproxMC4 with the counts computed by an exact model counter, 
namely DSharp*. The approximate counts and the exact counts are used to 


ae I(F 
compute the observed tolerance £obs, which is defined as max( ost 


1, ET — 1), where AprxCount is the estimate computed by ApproxMC4 
for a formula F and a sampling set S, which are both given for each bench- 
mark. Note that, using £obs, we can rewrite the theoretical (e, 6)-guarantee to 
Prleéops < €] > 1— and hence we expect that ¢,,; is mostly below € = 0.8. 
The observed tolerance €o), over all benchmarks is shown in Fig. 4. We observe 
a maximal value for £obs of 0.3333 and the the arithmetic mean of £obs across 
all benchmarks is 0.0411. Hence, the approximate counts are much closer to the 


exact counts than is theoretically guaranteed. 
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Fig. 3. Sampling performance of UniGen2 and UniGen2+BIRD versus UniGen3. 
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Fig. 4. The histogram of the observed tolerance €,»; shows that the approximate counts 
are very close to the exact counts. 


4 DSharp is used because of its ability to handle sampling sets. 
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Quality of Sampling. To evaluate the quality of sampling, we employed the 
uniformity tester, Barbarik, proposed by Chakraborty and Meel [5]. To this 
end, we selected 35 benchmarks from the pool of benchmarks employed by 
Chakraborty and Meel in their work and we tested UniGen3 for all the 35 
benchmarks. We observed that Barbarik accepts UniGen3 for all the 35 instances, 
thereby providing a certificate for uniformity. We refer the reader to [5] for 
detailed discussion of the guarantees provided by Barbarik. Keeping in line with 
past work on sampling that tries to demonstrate the quality of sampling on a rep- 
resentative benchmark where exact uniform sampling is feasible via enumeration- 
based techniques, we chose the CNF instance blasted_case110 (287 variables and 
16384 solutions), which has been chosen in the previous studies as well. To this 
end, we implemented a simple ideal uniform sampler, denoted by US henceforth, 
by enumerating all the solutions and then picking a solution uniformly at ran- 
dom. We then generate 4,039, 266 samples from both UniGen3 and US. In each 
case, the number of times various witnesses were generated was recorded, yield- 
ing a distribution of the counts. Fig. 5 shows the distributions of counts generated 
versus # of solutions. The x-axis represents counts and the y-axis represents the 
number of witnesses appearing the specified number of times. Thus, the point 
(230,212) represents the fact that each of 212 distinct witnesses were generated 
230 times among the 4,039, 266 samples. While UniGen3 provides guarantees of 
almost-uniformity only, the two distributions are statistically indistinguishable. 
In particular, the KL divergence [15] of the distribution by UniGen from that of 
US is 0.003989. 
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Fig. 5. Distribution of solution recurrence as generated by UniGen3 and US for the 
CNF blasted_case110.cnf. 


7 Conclusions 


We investigated the bottlenecks of CNF-XOR solving in the context of hashing- 
based approximate model counting and almost uniform sampling as implemented 
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in ApproxMC3 and UniGen2 respectively. In this paper, we proposed five techni- 
cal improvements, as follows: (1) detaching the clausal representation of XOR 
constraints from unit propagation, (2) lazy reason generation for XOR, con- 
straints, (3) bit-level parallelism for XOR constraint propagation, (4) partial 
solution extraction only covering the sampling set and (5) solution reuse. These 
improvements were incorporated into the new framework BIRD2, which led to 
the construction of improved approximate model counter ApproxMC4 and almost 
uniform sampler UniGen3. Experiments over a large set of benchmarks from vari- 
ous domains clearly show an improvement in running time and 77 more problems 
could be solved for counting and 51 more for sampling. 
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Abstract. The automatic formal verification of multiplier designs has 
been pursued since the introduction of BDDs. We present a new rewriter- 
based method for efficient and automatic verification of signed and 
unsigned integer multiplier designs. We have proved the soundness of this 
method using the ACL2 theorem prover, and we can verify integer multi- 
plier designs with various architectures automatically, including Wallace, 
Dadda, and 4-to-2 compressor trees, designed with Booth encoding and 
various types of final stage adders. Our experiments have shown that our 
approach scales well in terms of time and memory. With our method, we 
can confirm the correctness of 1024 x 1024-bit multiplier designs within 
minutes. 


Keywords: Multipliers - Hardware verification - Formal methods - 
ACL2 


1 Introduction 


Arithmetic circuit designs may contain bugs that may not be detected through 
random testing. Since the Pentium FDIV bug [29], formal verification has become 
more prominent for validating the correctness of arithmetic circuits. Despite 
being a crucial part of all processors, verifying the correctness of arithmetic 
circuits, specifically multipliers, is still an ongoing challenge. 

There have been numerous efforts to find a scalable and automated method to 
formally verify integer multipliers. Early methods that were based on attempts 
to represent hardware and its specification in various canonical forms - BDDs [6] 
and derivatives, have an exponential space complexity. Therefore, they were 
applicable only for small circuits. Similarly, SAT-based methods did not prove 
to be scalable [28]. 

There are several approaches for the verification of hardware multipliers used 
in the industry. One is based on writing a simple RTL multiplier design without 
optimizations and comparing it to the candidate multiplier design through equiv- 
alence checking [14,35]. This approach works only when the reference design is 
structurally close to the original under verification and relies on the correctness 
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of the reference design and proof maintenance whenever designers make struc- 
tural changes. Another approach is to find a suitable decomposition of a design 
into parts that can be verified automatically and compose those results into a 
top-level theorem [13,15,30]. The drawback of this method is that it requires 
manual intervention by the verification engineer who decides about the bound- 
aries of the decomposition. A third approach involves guiding a mechanized proof 
checker manually [27]. 

In recent years, the search for more automatic procedures resulted in methods 
based on symbolic computational algebra [7, 16,22, 23,40] . This approach makes 
it possible for certain types of multipliers to be verified automatically for larger 
designs. However, they have limitations as to what type of multipliers they can 
check (see experiments in Sect. 6). They are implemented as unverified programs 
and, as far as we are aware, only one of them [16] produces certificates. 

We have developed an automatic rewriter-based method for verification of 
hardware integer multipliers that is 


— widely applicable, 
— provably correct, and 
— scalable 


We implemented and verified our method with the ACL2 theorem proving 
system, which is a subset of the LISP programming language. Our method is not 
ACL2 specific and can be adapted to other platforms with suitable adjustments. 
In this paper, we also provide proof of its termination. Even though we have not 
proved the completeness of this method, our tool can verify various multiplier 
designs. We test our method on designs implemented with (System) Verilog 
where design hierarchy is maintained. We can verify various types of multipliers 
in a favorable time; for example, we tested our method with 8 different types of 
1024 x 1024 multipliers and verified each of them in less than 10 min, while the 
other state-of-the-art tools ran for more than 3h. 

The paper is structured as follows. In Sect. 2, we present some concepts that 
might be necessary to understand our approach. These include the basic notion 
of term rewriting and the ACL2 system (Sect. 2.1), the semantics for hardware 
modeling (Sect. 2.2), and some basic multiplier architectures (Sect. 2.3). Prelim- 
inaries are followed by our specification and top-level correctness theorem for 
multiplier designs (Sect.3). We explain our methodology to prove this top-level 
correctness theorem with term rewriting in Sect.4. Section 5 describes the ter- 
mination of our rewriting algorithm. Experiments with various benchmarks are 
given and discussed in Sect. 6. 


2 Preliminaries 


In this section, we describe the concepts and tools required to understand the 
method proposed in this paper. We review the ACL2 theorem prover and term 
rewriting, how Verilog designs are translated and used in proofs, and various 
integer multiplier architectures. 
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2.1 ACL2 and Term Rewriting 


ACL2 is a LISP-based interactive theorem prover that can be used to model 
computer systems and prove properties about such models using both its internal 
procedures as well as appealing to external tools such as SAT and SMT solvers. 
ACL2 is used by the industry for both software and hardware verification [12]. 
Our methodology to prove multipliers correct uses ACL2-based term rewriting. 

ACL2 can store proved lemmas as rewrite rules, and later use them when 
attempting to confirm other conjectures. ACL2 terms are prefix expressions 
and rewriting is attempted on terms such as (fnc argl arg2 ...). Left-hand 
side of a rewrite rule is unified with terms; in case of a successful unification, 
the matched term is replaced by a properly instantiated right-hand side if all 
hypotheses are satisfied. Example 1 shows two rewrite rules, the second of which 
can be proved using the first as a lemma. When users submit a defthm event, 
ACL2 attempts to confirm the conjecture by rewriting it in an inside-out man- 
ner. For the conjecture given in x-x_y-y, the rewriter replaces (+ x (- x)) and 
(+ y (© y)) with O using a-a as a lemma. Then the resulting term (+ 0 0) 
is replaced with O using the executable counterpart of the function +. 


Example 1. A simple rewrite rule a-a, and a theorem x-x_y-y proved subse- 
quently using a-a as a lemma. 


(defthm a-a 
(implies (integerp a) 
(equal (+ a (— a)) 0))) 
(defthm x—x_y—y 
(implies (and (integerp x) (integerp y) ) 
(equal (+ (+ x (— x)) (+ y (— y))) 
0))) 


The rewriting mechanism in ACL2 is much more complex and intricate than 
we indicate here [18]. Throughout the rest of this paper, we omit ACL2 spe- 
cific implementation details whenever possible. Understanding the basics of term 
rewriting is sufficient to follow our methodology. 


2.2 Semantics for Hardware Designs 


We convert (System) Verilog designs to SVL netlists in ACL2 and use SVL 
functions for semantics and simulation of circuit designs [33]. SVL netlists pre- 
serve hierarchical information about hardware designs and they are based on the 
SV [31] and VL [32] tools that are also included in the ACL2 libraries. These 
tools have been used by several companies to confirm the correctness of various 
circuit designs [12]. In this section, we describe the format of SVL netlists, and 
how they are simulated hierarchically. 

An SVL netlist is an association list where each key is a module name, and its 
corresponding value is the definition of the module. An SVL module is composed 
of input and output signals, and a list of occurrences. An occurrence can be an 
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assignment or an instantiation of another module. Example 2 shows a simplified 
SVL netlist containing a half and a full-adder. 


Example 2. An SVL netlist for half and full-adder. 


(("ha" (inputs x y) 

(outputs s c) 

(occs ((occl :assign s (bitxor x y)) 
(occ2 :assign c (bitand x y))))) 

("fa" (inputs x y z) 

(outputs s c) 

(oces ((occl :module "ha" (ins x y) (outs t1 t2)) 
(occ2 :module "ha" (ins t1 z) (outs s t3)) 
(occ3 :assign c (bitor t2 t3)))))) 


The semantics of an SVL netlist is given by a recursively defined ACL2 func- 
tion, svl-run. This function traverses occurrences of a module and simulates 
them in order by evaluating the assignments and making a recursive call for 
the submodules. After each occurrence, the values of wires/signals are stored in 
an association list, and when finished, svl-run retrieves and returns the val- 
ues of output signals from this association list. These values can be concrete 
(svl-run is executed), or symbolic (the rewriter processes a call of svl-run 
with variables for inputs), which can create ACL2 expressions representing the 
functionality of the design for each output. For example, we can generate expres- 
sions for the outputs of the full-adder ("fa") in Example 2: (6 x y z) and 
(V (Axy) (A (®@ x y) z)). Alternatively, since the design retains hierar- 
chy, submodules can be replaced by their specification. For example, assume 
that we have specification functions s-ha and c-ha for each output of the half- 
adder ("ha"), and we proved a rewrite rule to replace calls of svl-run of "ha" 
with these functions. If we rewrite the instantiations of "ha" with this rule while 
expanding the definition of "fa", we can instead get (s-ha (s-ha x y) z) and 
(V (c-ha x y) (c-ha (s-ha x y) z)) for each output of "fa". 


2.3 Multiplier Architectures 


In this section, we discuss the most commonly used algorithms to implement 
integer multipliers. We summarize partial-product generation algorithms, such 
as Booth encoding, and partial-product summation algorithms, such as Wallace- 
tree. Even though the applicability of our verification method is not confined to 
a specific set of algorithms, reviewing them is beneficial for understanding the 
verification problem. 

We can divide multiplier designs into two main components: partial product 
generation and summation. Figure la shows these two steps on multiplication 
of two 3-bit two’s-complement signed integers. We perform sign-extension (for 
signed numbers) or zero-extension (for unsigned numbers) on inputs, generate 
partial products, and then add them together to obtain the multiplication result 


Automated and Scalable Verification of Integer Multipliers 489 


in a fashion similar to grade-school multiplication. The integer multipliers we 
have verified implement various partial-product generation and summation algo- 
rithms for the same functionality with optimizations for better gate-delay and/or 
area. 


a2 a a2 a2 a1 ao 
x be be be be bı bo 


a a a a a ao À e 
5 b2 be be be bi bo t 
ee A Z 


azbo azbo azbo abo aļbo aobo go 
abı abı azbų aibi aobi ii ) 
ee 
azbe azb2 aib2 aobe2 BABE 
azb2 aib2 aob2 + @ ) 
aib2 aob2 Ge eece 
ee 
+ aobe + e > 
° o o 
outs out, outs oute out: outo 
4 
(a) oe o o o oo > 


Fig. 1. (a) Grade-school-like multiplication for two 3-bit two’s-complement integers, 
and (b) a Wallace-tree-like multiplier performing bit-level additions on the partial 
products 


Baugh-Wooley [1] and Booth [2] are commonly used algorithms to generate 
partial products. Baugh-Wooley is used for signed multiplication, and it gener- 
ates partial products as shown in Fig. la, but with a sign-extension algorithm to 
prevent the repetition of generated partial product bits. A more commonly used 
alternative is Booth encoding, which can be used for both signed and unsigned 
multiplication. Instead of simply multiplying all the single bits of the two inputs 
with each other, Booth encoding uses more than one bit at a time from one of the 
operands, and it derives a more complex form for partial products. This helps 
reduce the number of rows for partial products, thus helping shrink the summa- 
tion circuitry and allowing more parallelism. Booth encoding can be implemented 
with different radices, which determine the number of multiplier bits used at a 
time to create partial products (e.g., Booth radix-4 [21] uses 3 bits at a time). The 
higher the radix, the fewer the partial products; however, higher radices yield 
a more complex design. Booth encoding can be combined with sign-extension 
algorithms [38] to prevent repetition in generated partial products. 

A rudimentary way to sum partial products is by using a shift-and-add algo- 
rithm. One may use an accumulator and a vector adder such as a ripple-carry 
adder to shift and add partial products. An array multiplier is a variation of this 
algorithm and it is implemented using a very similar principle with some addi- 
tional optimizations. Due to their regular structure, verifying the correctness of 
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these multipliers has not been a challenging problem [5]. However, these circuits 
often have very large gate delays, and Wallace-tree like multipliers are preferred 
over these algorithms in industrial applications. 

A family of partial product summation algorithms, which are often called 
Wallace-tree like multipliers [36], use parallelism to obtain multiplication results 
with less gate-delay but produce a very irregular and complex design structure. 
Figure 1b shows an example of a Wallace-tree algorithm. In the first summation 
layer, we see the generated partial products corresponding to the ones in Fig. la. 
The Wallace-tree algorithm selects groups of bits from these partial products 
and passes them to full and half-adders. After these parallel bit-level additions, 
resulting carry and sum output bits are replaced on another layer whose sum- 
mation will also yield the multiplication result. At each stage, layers are com- 
pressed, and the number of rows decreases. We repeat this process until we reach 
a state where we have only two rows. Then, instead of using full and half adders 
to finish additions, a vector adder (final stage adder), such as carry-lookahead 
and parallel prefix adders, is used. This method may provide a significant delay 
reduction over array multipliers. There exist numerous variations of Wallace-tree 
multipliers such as Dadda-tree [8] and 4-to-2 compressor trees [11]. Due to their 
highly irregular structure, reasoning about Wallace-tree like multipliers is a diffi- 
cult problem, especially when combined with complex partial product generation 
algorithms such as Booth encoding. There is a lot of room for circuit designers 
to deviate from text-book algorithm definitions when creating multipliers, which 
increases the importance of having an automated method to verify these circuits 
with minimal assumptions about the structure. 


3 Specification 


We aim to prove the functional correctness of signed and unsigned multiplier 
designs. We do that by proving an ACL2 theorem demonstrating the equivalence 
of semantics of a multiplier circuit design to the built-in ACL2 multiplication 
function (*) with appropriate sign extensions and truncations. 

We work with integer multiplier circuits that are designed to multiply two 
numbers (signed or unsigned) stored in bit-vectors and cut (truncate) the result- 
ing number to return it as a bit-vector. If we are multiplying m-bit and n-bit 
numbers, then the first m+n bits of the result is sufficient to represent all output 
values. For example, assume that we are multiplying signed numbers —4 and 3, 
represented with 4-bit vector 1100 and 3-bit vector 011, respectively. Then, a 
correct multiplier would return the 7-bit vector 1111100, which represents -12. 

Listing 1.1 shows the final ACL2 theorem we prove for signed integer multi- 
pliers, where a and b are variables and *m* and *n* are concrete values!. This 
theorem states that for all integers a and b, simulating an m-by-n signed multi- 
plier circuit returns a value that is equivalent to multiplication of sign-extended 
a and b, truncated at m + n bits. On the left-hand side, *«signed_mxn_mult+ 


1 By convention, “+” 


in ACL2. 


characters surrounding variables, such as xm», signify constants 
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is an ACL2 constant that contains the multiplier design in SVL format which is 
translated from (System) Verilog, and svl-run is the function to simulate this 
module with inputs a and b. On the right-hand side, * is the built-in integer mul- 
tiplication function, truncate returns first m+n bits of the result, and signext 
returns a number that represents the sign-extended value of a bit-vector. Multi- 
plier designs are implemented with fixed values of m and n; therefore, we prove 
such theorems for constants m and n and variables a and b. The ACL2 theorem 
for unsigned multiplication has the same form but in the place of signext, we 
use the truncate function, which performs zero-extension. The actual statement 
of the theorem contains more components than shown, including function calls 
to extract outputs and parameters for state-holding elements; we only give the 
essentials for brevity. 


Listing 1.1. The Final Correctness Theorem for Signed Multipliers 


(defthm multiplier_is_correct 
(implies (and (integerp a) 
(integerp b) ) 
(equal (svl—run (list a b) *xsigned_mxn_mult*) 


(truncate (+ *m* *n*) 


( 
(* (Signext *mx* a) 
(signext *n* b)))))) 


4 Methodology 


The correctness theorem given in Listing 1.1 is proved by rewriting both sides 
of the equality to two syntactically equivalent terms. In this section, we describe 
our methodology to rewrite both sides to a specific form through an automated 
rewriting mechanism. 

We have a targeted final expression for each output bit of a multiplier design, 
the mathematical formula of which is given in Definition 2. The variables a and 
b are the inputs/operands of multiplication with a certain size (e.g., 64 bits for 
64 x 64 multiplication); and in this formula, they are sign-extended for two’s 
complement signed multiplication or zero-extended for unsigned multiplication. 


Definition 1. We define functions s and c as follows. 


Va E€ Z s(x) = mod2(x) 
Ve € Z cx) = F] 


Definition 2. The targeted form for each output bit (out; ) is defined as follows. 
_J(© aibj-zi)+celwj-1) af 720 
Wj =$ i=0 
0 otherwise. 
out; = s(w;) 


where aibj—i is logical AND of the ith and (j — i)th bits of operands a and b. 
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Table 1 shows an example of this targeted final form for the first four output 
bits of 3 x 3 two’s complement signed multiplication (see Fig. 1a). Each output 
bit is represented with expressions composed of the s, c, and + functions. In this 
representation, the outermost function of each expression is s, carry bits from 
previous columns are calculated with a single c per column, and the terms in 
summations are sorted lexicographically. Two’s complement signed or unsigned 
integer multiplication implemented by our candidate designs (See Sect.6) can 
be represented by an expression of this form. 


Table 1. Expressions for the final form of the first four output bits from Fig. la 


outs out2 outi outo 


s(aob3 + a1b2 + az2bı + azbo | S(aob2 + a1bi + a2bo | s(arbo + aobı s(aobo) 


+c(aob2 + a1bı + az2bo +ce(a1bo + aobi +c(aobo)) 
+c(a1bo + aobi +c(aobo))) 
+c(aobo))) 


A summary of our rewriting approach to verify multiplier designs is given 
in Fig.2. Our method works with design semantics such as SVL where circuit 
hierarchy can be maintained and we reason about adder modules and the main 
multiplier module at different stages. As the first step, we work only with adder 
modules (e.g., half/full-adders and final stage adders) instantiated as submodules 
by the candidate multiplier design. We state a conjecture similar to Listing 1.1 
for each adder module. We simplify their gate-level circuit description and prove 
them equivalent to their specification. We save these proofs as rewrite rules where 
lhs is svl-run of adder module and rhs is its specification. Having created these 
rewrite rules for all the adder modules, we start working on the correctness proof 
of the multiplier design as stated in Listing 1.1. On the LHS, as we derive ACL2 
expressions from the definition of multiplier designs (see Sect. 2.2), we replace 
instantiated adder modules with their specification, and we apply two other 
sets of rewrite rules to simplify summation tree and partial product logic. On 
the RHS, we rewrite the multiplier specification into the targeted final form of 
multiplication, and we syntactically compare the two resulting terms to conclude 
our multiplier design proofs. 

We simplify adder and multiplier modules by stating a set of lemmas in the 
form of equality lhs = rhs. These lemmas are used to create a term rewriting 
mechanism where expressions from circuit definitions are unified with lhs and 
replaced with their corresponding rhs. We aim to provide a set of lemmas so 
that such an automated system of rewriting can reduce a wide range of multiplier 
circuit designs to the final form as given in Tablel. In pursuit of this goal, we 
devised and experimented with various rewriting strategies; and we came up 
with a well-performing heuristic. In the subsections below, we describe these 
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Adder Module Proofs (for each) ; i Multiplier Module Proofs 


Rewrite the 
Multiplier 
Specification 


Adder 
Specification 


Simplify 
Summation 
Trees 


Syntactic 
Comparison 


Simplify Adder 
Module 


Syntactic 
Comparison 


Simplify Partial 
Products 


Fig. 2. Summary of the overall method 


lemmas separated into two main sets for adder and multiplier modules, and the 
general mechanism to prove them equivalent to their specification. The lemmas 
we introduce are proved using ACL2, and we omit the proofs for brevity. 


4.1 Adder Module Proofs 


The first step of our rewriting strategy is to represent the outputs of adder 
modules in terms of the s, c, and + functions. We first determine the modules 
that serve as adder components in multiplier designs, such as half-adders, full- 
adders, 4-to-2-compressors, and final stage adders. Then we state a conjecture 
similar to Listing 1.1 where lhs is svl-run of the adder module and rhs is its 
specification. We prove this conjecture with a library of rewrite rules, derived 
from the lemmas given in this section, which can simplify various types of adder 
modules and prove them equivalent to their specification. 

For vector adders, specifications have a fixed format as shown in Table 2; 
however, for single-bit adders, such as full-adders and 4-to-2 compressors, speci- 
fications may vary. The format of these specifications can be of any form as long 
as they are composed of only the s, c, and + functions as given in Table 2. For 
adders that are not given in this table (e.g., 4:2 compressors), users may derive 
their specifications by simplifying them with the lemmas introduced below. 

We expect adder modules to be composed of logical AND (A), OR (v), 
XOR (®), and NOT (-) gates in certain patterns. We get expressions for these 
circuits’ functionality in terms of these functions through SVL semantics. We 
rewrite these expressions with the lemmas given below to simplify them to the 
same form as their specification. We define the operators A (and), V (or), ® 
(exclusive or), and — (negation) to work with integer-valued bits (e.g., 1^0 = 0, 
1V1l=1,or0@61=1). 


Lemma 1. Vz,y € {0,1} £ Dy = s(x + y) 


Lemma 2. Yz, y € {0,1} xr Ay = c(z +y) 


Lemma 3. Vz,y,h,g € {0,1} clut+y+h)V(s(at+y)Ag) =cla+y+(hVgQ)) 
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Table 2. Rewritten outputs of some adders 


Adder outs out out outo 
Half-adder -— - c(ao + a1) s(ao + a1) 
Full-adder = = c(ao + a1 + a2) | s(ao + a1 + a2) 
s(a3 + b3 8(a2 + be s(ai + by s(ao + bo) 
Vector adders tolas + ba telar Pt +c(ao + bo)) 
+c(aı + bi +c(ao + bo))) 
+c(ao + bo))) 


We implement these lemmas as well as some corollaries as rewrite rules so 
that terms that can be unified with the lhs of equations are replaced by their 
respective rhs. An example corollary is Vx, y, g € {0,1} (x Ay) V (s(a@+y) Ag) = 
c(a + y + g) that can be derived from Lemmas 2 and 3. Similarly, Vz, y, h € 
{0,1} caty+h)V s(a@+y) = c(x+y+1) can be derived from Lemma 3. These 
extra lemmas help expand our coverage to match more term patterns that may 
occur. 

We add other rewrite rules using elementary properties of V, A and + that 
help facilitate simplification. Lemma 3, and some corollaries rewrite terms with 
repeated variables. In such cases, in order for the rewriter to match the lhs with 
an applicable term, it is necessary to flatten the terms with associativity (e.g., 
((a+b) +c) = (a+b+ c)) and lexicographically sort them using commutativity 
(e.g., (b+ a) = (a+ b)) for every +, V and A instance. Other examples of 
rewrite rules we have in our system implement identity and inverse properties 
of addition. Finally, we have a lemma that rewrites the definition of 6, which is 
(~ab V a=b), in terms of s as given in Lemma 1. 

Note that we put a restriction on the use of the rewrite rule for Lemma 2 
such that it is used only when x and y are input wires of the adder module. 
The function c is a specification for carry, and not all AND gates may calculate 
carry by themselves. We have observed that only the logical AND of input signals 
should be rewritten to c. Rewriting the other instances of A in terms of c prevents 
application of Lemma 3 and complicates our rewriting approach. We enforce this 
restriction in ACL2 through a syntactic check. 

Our experiments given in Sect.6 demonstrate that the method we described 
in this section can automatically simplify vector adders including ripple-carry, 
carry-lookahead [26] and parallel-prefix adders such as Brent-Kung [4], Ladner- 
Fischer [20], Kogge-Stone [19], Han-Carlson [9] and others. 

Reasoning about adder modules before the candidate multiplier module is 
a crucial step in our rewriting mechanism. The functionality of all the adder 
modules should be represented with the s, c, and + functions when expanding 
the definition of the multiplier module. Then, and only then, the multiplier 
design can be simplified and proved correct with the lemmas described in the 
subsequent section. 
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4.2 Multiplier Module Proofs 


After creating rewrite rules for adder modules, we start working with the correct- 
ness proof of our candidate multiplier design as given in Listing 1.1. Similarly, 
we convert multiplier modules into ACL2 expressions, replace instantiated adder 
modules with their specifications, and perform simplification with a rewriting 
mechanism derived from the lemmas introduced in this section. We first describe 
how we simplify complex expressions that originate from summation tree algo- 
rithms such as Wallace-tree. Secondly, we add more lemmas to simplify partial 
product logic that may be generated with Booth encoding. After rewriting with 
these lemmas, we expect to have simplified multiplier designs to our targeted 
final form as given in Table 1. We rewrite the multiplication specification into 
our final form as well and conclude verification with a syntactic equivalence 
check. 


Simplify Summation Trees. In some integer multiplier designs, summation 
of partial products may be implemented with a very irregular structure, as is the 
case with Wallace-tree like multipliers (see Sect. 2.3), and it can be challenging 
to simplify them to a regular and more easily interpretable form. We describe 
a set of lemmas, solving this problem by providing an efficient and automated 
mechanism for such complex structures. Below, we discuss the simplification 
method for multiplier designs implemented with simple partial products. 
Having rewritten the adder components in terms of the s, c, and + functions, 
Example 3 shows the term representing the 4th LSB of a Wallace-tree multiplier 
output. Our goal is to reduce such terms to our final form as given in Table 1. 


Example 3. The 4th LSB of the Wallace-tree multiplier output from Fig. 1b after 
adder submodules are rewritten in terms of the s, c and + functions: 


s( s( s(a3bo + ab, + a1b2) 

+aob3 

+c(azbo + a,b, + aob2)) 

+c(s(agbo + aıbı + agb2) + c(a1bo + agb))) 


In such summation trees, we observe many nested calls for s. These can be 
simplified easily by the following rule. 


Lemma 4. Vz,y € Z s(s(x) + y) = s(x + y) 
Example 4. Example 3 simplified with Lemma 4: 


s(azbo + abı + a,b + agb3 
c(azbo t a,b; t agb2) 
+c(s(agbo t a,b, t aob2) t c(aıbo t aob1))) 


Terms derived from summation trees may include many instances for addition 
of two or more calls of c. Since such instances are not present in the final form, 
we try to remove them. That can be done by merging such calls of c through a 
temporary conversion to d as implemented with the lemmas given below. 
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Definition 3. We define function d as follows. 


Lemma 5. YVu,y E€ Z c(x) + cly) = d(x + y — s(x) — s(y)) 


Lemma 6. Vz, y € Z c(x) + d(y) = d(a + y — s(2)) 


Lemma 7. Vx,y € Z d(x) + d(y) = d(x + y) 
Lemma 8. Yr € Z d(—s(x) + x) = c(z) 


Applying Lemmas 5, 6, 7, and 8 repeatedly to the term in Example 4, we obtain 
the term given in Example 5. Since Va,b € {0,1} c(a A b) = 0, we have a term 
that matches the 4th bit of the final form for multiplication as given in Table 1. 
It is not required to convert certain instances of d back to c with Lemma 8; 
however, we can achieve better proof-time performance by shrinking terms with 
this rewrite. 


Example 5. Example 4 simplified with Lemma 5, 6, 7, 8: 


8(a3b9 + ab, + a1b2 + abs 
t c(azbo t aıbı t agbe 
+c(azbo + aob1))) 


Rewriting with Lemmas 5 and 6 creates new instances of s, which may not 
seem preferable at first glance because terms become less similar to the final 
form. However, we have found that for correct designs, these extra subterms 
cancel out and vanish during the rewriting process. We have seen this to be the 
case even for very large and much more complex terms that may have millions 
of nodes. 

We implement these lemmas as rewrite rules as well as some elementary 
algebraic properties in order to flatten and sort terms lexicographically in sum- 
mations. Our rewrite rules do not subsume each other, and they may be applied 
with an arbitrary order until none of the rules are applicable. 


Simplify Partial Products. Unlike the simple partial product generation 
method, multipliers with Booth encoding implement a more advanced algorithm 
to generate partial products. That results in terms that are more complex (see 
Example 6) than those we have addressed so far. We expand our rewriting mech- 
anism for simplification of summation trees and add more rewrite rules for auto- 
mated simplification of partial products such as the ones generated with Booth 
encoding and sign extension tricks. 
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Example 6. Below is a term for the second LSB of a multiplier output, imple- 
mented with Booth radix-4 encoding and before any simplification for partial 
products took place: 
s([~b1 boa; Vv bı mbomnao Vv bı bonal] 
+c([b1bo V bı bo] 
+[b1 7b V aby boao V bıbomao])) 

Similar to other multiplier verification methods [25], we perform algebraic rewrit- 
ing on the 6, V and ~ functions with the following lemmas. 


Lemma 9. Vz € {0,1} >x =1-— x 


Lemma 10. Vz,y € {0,1} rV y =x +y- zry 


Lemma 11. Vzr,y € {0,1} ry =£+y-— Ty — Ty 


Example 7. Example 6 rewritten with Lemma 9, 10, and 11 as well as elementary 
algebraic properties. 


s(bı + boaı — bya9 + bıboao — bı1boaı — b1b0a1 
He(bı + bı + boao — bıboao — bıboao)) 


We would like such expressions to be simplified to our final form. When deriving 
our rewrite rules, we concentrate on the terms with negative and/or duplicate 
arguments and realize that applying the following set of lemmas is sufficient to 
simplify such complex expressions. 


Lemma 12. Vz,y € Z s((—x) + y) = s(x + y) 


Lemma 13. Yr,y € Z c((—x) +y 


Lemma 14. Vz,y € Z d((—x) + y) = (~x) + d(x + y) 


Lemma 15. Vr,y E€ Z s(a+u+y 


) 
Lemma 16. Yz,y E Z c(x +x +y) =x + cly) 
Lemma 17. Yrz,y E€ Z d(x +x +y) 


Example 8. Below is the resulting term after Example 7 is simplified using 
Lemma 12-17 and elementary algebraic properties. We obtain a term match- 
ing the final form in Table 1. 


s(boai + biao + c(boao)) 


We implement these lemmas as rewrite rules along with the rules for sim- 
plification of summation trees. All of these lemmas automatically work together 
without any user intervention. 

Algebraic rewriting of logical gates can be very expensive in terms of time 
and memory. For this reason, we limit the application of these rules to the partial 
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product logic only. For example, if applied indiscriminately, Lemmas 10 and 11 
can cause terms to grow exponentially. Even though partial product generation 
logic may allocate a large area in multipliers, rewriting the adders to the s, c, 
and + functions isolates partial products from each other and segregates them 
into small chunks. We expect that expressions representing partial products are 
composed of the V, A, Ð, and — functions only. Therefore, we restrict Lemmas 9- 
11 to apply to terms that are composed of these functions only; and we restrict 
Lemmas 12-17 to apply to terms that are composed of minterms, and the — and 
+ functions only. For instance, in Lemma 13, if we are unifying x with a term 
that contains an instance of s, c or d, then we prevent rewriting with a syntactic 
check. This heuristic helps contain this potentially expensive approach to only 
local and smaller terms. 


Rewrite the Multiplier Specification. In our proposed rewriting scheme, 
we have a targeted representation for each output bit of multiplication as 
given in Definition 2. The rewriter cannot derive this form directly from 
the built-in ACL2 multiplication (x) function. Thus, we provide a recursively 
defined function multbycol that follows the formula in Definition 2. We prove 
multbycol to be equivalent to the * function. When the rewriter works on 
the conjecture stating the correctness of a multiplier design as shown in List- 
ing 1.1, (truncate size (* a b)) is rewritten to (multbycol a b size). 
The rewriter can then efficiently convert the specification into the targeted final 
form. 

Using the rewriting mechanism described in this section, we can verify mul- 
tipliers with Baugh-Wooley, sign/unsigned Booth radix-4, and simple partial 
product generation algorithms with various summation tree algorithms such as 
Wallace and Dadda tree. Note that Lemmas 9-17 work together with Lem- 
mas 4-8 but contradict Lemmas 1-3. This is the reason why our method relies 
on semantics where the design hierarchy is maintained so that we can simplify 
the logic in adder modules with Lemmas 1-3 and simplify the remainder of a 
multiplier design with Lemmas 4-17 at a different time. When this separation 
is possible, multiplier designs are verified fully automatically without requiring 
users to designate the type of algorithm used. The complete process of proving 
the equivalence of semantics of a multiplier design to its specification is verified 
using ACL2. 


5 Termination 


Our rewriter does not enforce proof of termination for rewrite rules. The program 
terminates either when there are not any applicable rules or when a certain 
number of steps are taken, which may happen if that number is too small for the 
current conjecture, there is a loop between rules, or some rules grow some terms 
indefinitely. Even though it is not required by the rewriter, it is important to 
show that our rewriting algorithm requires a limited number of steps and does 
not run indefinitely. 
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Terms from conjectures change every time a rewrite rule is applied. Therefore, 
for each of our rewriting algorithms (adder and multiplier module simplification), 
we define a measure calculated on the term and show that it decreases every time 
we rewrite with one of our lemmas. We first define the measure for simplifying 
adder modules (Lemmas 1-3). Since carried out separately, we define another 
measure for the summation tree and partial product simplification algorithms 
(Lemmas 4-17). For brevity, we omit the discussion for termination with other 
lemmas pertaining to elementary algebraic properties such as commutativity and 
associativity. 


5.1 Measure for Adder Module Simplification 


The first part of our multiplier verification algorithm is simplifying the logic in 
adder components and rewriting them in terms of the s, c, and + functions. 
Below, we define auxiliary functions and a measure that guarantees termination 
of this part of the algorithm that rewrites terms with Lemmas 1-3. 


Definition 4 (fı). Function fı counts the number of symbols (constants, func- 
tions and variables) in a term. 


Definition 5 (f2). Function fz counts the occurrences of \ and ® in a term. 


For example, computing fı and f2 on the term s(x ®@y+xAz+c(x@y)) yields 
13 and 3, respectively. 


Definition 6 We define a measure mı as follows, where the resulting ordered 
pairs are compared lexicographically. 


my(term) =< fo(term), fi(term) > 


The pairs produced by mı are ordered lexicographically: thus, the value of 
mı decreases if fg decreases (no matter the value of fı), or fo stays the same 
and fı decreases. Rewriting with Lemmas 1, 2, and 3 decreases fo. Rewriting 
with some corollaries does not change the value of fə but decreases fı . For 
example, rewriting with the corollary Yz, y,h € {0,1} ca+y+h)Vs(a+y) = 
c(x + y+ 1) does not change fo but decreases fı. In short, every step taken 
with these lemmas decreases the value of mı calculated on the resulting term. 
Therefore, the rewriting algorithm for adder modules terminates. 


5.2 Measure for Multiplier Module Simplification 


Rewriting for summation tree and partial product generation algorithms are per- 
formed together with a rewriting algorithm derived with Lemmas 4-17, excluding 
Lemmas 1-3. Therefore, we define a single measure to describe the termination 
of this part of the rewriting mechanism. Below we give definitions for some aux- 
iliary functions and our measure. 
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Definition 7 (f3). Function f3 sums the occurrence-depth of negative 
minterms, where the occurrence-depth is calculated with respect to the overall 
term. 


For example, computing f3 on the term s(xox1 + c(—22Yo + c(—x3y1))) yields 
5 because its negative minterms —x2yo and —x3y, occur at depth 2 and 3, 
respectively. These values can be calculated by counting the unclosed parentheses 
from the beginning up to the occurrence of these terms. 


Definition 8 (f4). Function f4 computes the number of unique occurrences of 
functions {c, d, 7, ®, V}. 


For example, computing f4 for the term c(zo) + s(xı + c(ao) + c(x1)) yields 2 
because even though there are three instances of c, the second occurrence of 
c(xo) is not counted. 


Definition 9. We define measure mz to return ordered triples as follows, to be 
compared lexicographically. 


Mo(term) =< fa(term), f3(term), fi(term) > 


The value of mz decreases if f4 decreases, or f4 stays the same and fs 
decreases, or f4 and f3 stay the same and fı decreases. Below we discuss how 
rewriting with Lemmas 4-17 satisfy this measure for termination. 

Rewriting with Lemmas 4 and 8 does not change the value of f4. For both 
lemmas, if x is unified with a term that contains a negative minterm, then the 
value of f3 decreases, otherwise, f3 remains the same. By removing an instance 
of s, rewriting with both lemmas decreases fı and consequently mg. 

Rewriting with Lemmas 5, 6, 7, 9, 10, and 11 decreases f4, and therefore mz, 
by removing an instance of d, c, =, V or @. Even though rewriting with some of 
these lemmas creates copies of terms, the value of f4 decreases because it does 
not count the same term more than once. 

Rewriting with Lemmas 12-17 does not affect the value of f4 since they 
are restricted to rewrite terms that contain only the + and — functions, and 
minterms. For Lemmas 12, 13, and 14, x can only be unified with a positive 
minterm. Therefore, rewriting with these lemmas does not change f3. For Lem- 
mas 15, 16, and 17, if x is unified with a negative minterm, then f3 decreases. 
Otherwise, f3 remains the same and fı decreases. 

In short, rewriting with Lemmas 1-3 decreases the measure m; and rewriting 
with Lemmas 4-17 decreases the measure m2. Therefore, our proposed rewriting 
mechanism terminates. 


6 Experiments 


In this section, we present our experimental results and compare them to the 
other state-of-the-art tools for the automated verification of multiplier designs. 
We have gathered a large set of multipliers from 3 different generators, and run 
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all the experiments for other verification tools and ours on the same computer 
(A 2014 model iMac Intel(R) Core(TM) i7-4790K CPU @ 4.00 GHz with 32 
GB system memory) for comparison. The instructions and a ready-to-run VM 
image to run our tool and reproduce these experimental results can be found 
online at http://mtemel.com/mult.html. 

For benchmarking, we used 3 different generators. The tool from Homma 
et al. [10] generates Booth encoded sign and unsigned multipliers (input size 
up to 64 bits) with various summation tree and final stage adders. Designs from 
Homma et. al. have multiple copies of half/full-adder modules as well as some 
other adder modules. Since our method requires reasoning about each adder 
module, we wrote a function that scans the modules and automatically sim- 
plifies them as described in Sect. 4.1. Secondly, we used SCA-genmul [24] to 
generate simple unsigned and Baugh-Wooley based signed (also referred to as 
simple signed) multipliers. This tool does not generate Booth-encoded multipli- 
ers. Finally, we used another multiplier generator [34] that can generate large 
Booth-encoded multipliers. 

We have measured the complete proof time for each benchmark, when avail- 
able, and compared our results to the work of D. Kaufmann et al. [16] and A. 
Mahzoon et al. [23]. These methods are based on computer algebra, and they 
are the best performing tools at the time this paper is rewritten. Since we veri- 
fied the correctness of our tool using ACL2, we do not generate certificates. D. 
Kaufmann et al. implement their method in a stand-alone C program but they 
generate certificates to check their proofs. We measured the total time to veri- 
fy/certify and check certificates. A. Mahzoon et al. also test their method with 
a stand-alone C program but it does not produce any certificates. Even though 
it is not a complete comparison, we still include the results of their tool for the 
same benchmarks. 

When we run our tool on these benchmarks, we only need to identify the 
names of the adder modules, their I/O size; multiplier I/O size, and whether 
they perform signed or unsigned multiplication in order to determine their spec- 
ification. The proofs finish automatically, and users can see the specification 
explicitly to validate what is proved. The other tools are not interactive and use 
some heuristics to decide on the specification internally based on the design. 

D. Kaufmann et al. [16] and A. Mahzoon et al. [23] both use AIGs as inputs, 
and we use SVL [83], all of which are translated from (System) Verilog using 
external tools. For the other tools, we used Yosys [39] and ABC [3] to cre- 
ate AIGs, without any optimization. For our tool, we created SVL netlists as 
described in Sect. 2.2. Since we compare the performance of different verification 
methods, we do not include the translation time in any of these results. 

Table 3 shows the result of experiments run with a collection of circuits. The 
benchmarks are described with the generator, partial product generation algo- 
rithm, summation tree algorithm, and final stage adder. Generators are tem [34], 
sca [24], and hom [10]. Partial product generation algorithms are sp (simple 
unsigned /signed or Baugh-Wooley-based), and bp (unsigned and signed Booth 
radix-4 encoded). Summation tree algorithms are dt (Dadda tree), wt (Wal- 
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Table 3. Proof-time results in seconds for various multiplier designs 


Size Benchmark AM [23]* DK [16] Our tool 
Unsigned | Unsigned | Signed | Unsigned | Signed 
64 x 64 sca sp-dt-bk 39 6 6 1 1 
sca sp-wt-lf 33 6 6 1 1 
sca sp-cwt-ks TO 65 58 1 1 
sca sp-ar-rc 23 5 5 1 1 
tem sp-dt-ks 173 1 1 
tem sp-wt-lf 33 6 6 1 1 
tem bp-dt-hc TO 44 49 1 1 
tem bp-wt-rp TO 45 49 2 2 
hom bp-dt-ks 288 8 TE 2 2 
hom bp-bdt-hc TO 7 7 2 2 
hom bp-os-bk 71 6 TO 3 3 
hom bp-wt-cla 108 24 21 13 12 
hom bp-4:2-lf TE 7 T: 3 3 
128 x 128 sca sp-dt-bk 643 33 36 
sca sp-wt-lf 633 34 38 
sca sp-cwt-ks TO TO TO 
sca sp-ar-rc 384 27 27 18 18 
tem sp-dt-ks TO 47 49 
tem sp-wt-lf 650 40 40 
tem bp-dt-hc TO 877 1037 
tem bp-wt-rp TO 918 1067 12 13 
256 x 256 sca sp-dt-bk TO 213 209 9 11 
sca sp-wt-lf 15351 226 223 11 13 
sca sp-cwt-ks TO TO TO 13 15 
tem sp-dt-ks TO 234 232 10 12 
tem sp-wt-lf 15552 220 221 10 12 
tem bp-dt-hc TO 11555 14043 41 47 
tem bp-wt-rp TO 11975 | 14264 54 58 
512 x 512 |sca sp-dt-bk TO 1562 1562 53 64 
sca sp-wt-lf TO 1588 1577 61 76 
tem sp-dt-ks TO 1655 1655 68 75 
tem sp-wt-lf TO 1604 1609 65 82 
tem bp-dt-hc TO TO TO 246 281 
tem bp-wt-rp TO TO TO 371 380 
1024 x 1024 | sca sp-dt-bk TO 13746 13247 339 397 
sca sp-wt-lf TO 13560 | 14005 322 345 
tem sp-dt-ks TO 14125 | 15198 324 392 
tem sp-wt-lf TO 13664 13708 327 393 


a Does not produce certificates. 
TE: Terminated with an error. TO: Time-out. 5400s. (90 min) for 64 x 64 and 
128 x 128, 16200s (270 min) for the rest. 
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lace tree), cwt (counter-based Wallace tree), ar (array), os (overturned-stairs 
tree), bdt (balanced delay tree), and 4:2 (4-to-2 compressor tree). Finally, the 
final stage adders are bk (Brent-Kung), /f (Ladner-Fischer), rc (Ripple-carry), 
ks (Kogge Stone), csk (Carry-skip), hc (Han-Carlson), and cla (Carry-lookahead). 
The selection of benchmarks was arbitrary but we have concentrated on Wallace- 
tree-like multipliers with complex final stage adders as they have a more 
widespread industrial application. For experiments with 64 x 64 and 128 x 
128 multipliers, we set the time limit to 1.5 h, and for larger designs, we set the 
limit to 4.5 h. The results are given in seconds rounded to the nearest integer. 

For all the benchmarks we have tested, our tool out-performed the other tools 
in all cases. Our method is shown to verify benchmarks the others cannot and 
produce a more homogeneous timing performance across different designs. A. 
Mahzoon et al. [23] work only on unsigned multipliers. Both A. Mahzoon et al. 
and D. Kaufmann et al. [16] give fluctuating results for multipliers with different 
architectures and/or different generators. For some benchmarks, the other tools 
terminated with an error such as segmentation fault (marked with TE). Our 
work is more resilient to differences in designs and it scales much better (proof 
times increase by 4.5-6 times when circuit size grows 4 times). For Wallace-tree 
like multipliers with simple partial products, about 40% of the time on average is 
spent on simplification with the lemmas given in Sect. 4, and the rest is spent by 
conversion of SVL semantics to ACL2 expressions. For multipliers with Booth- 
encoding, over 70% of the time is spent on partial product simplification. Array 
multipliers are the only type of circuit for which our tool struggles to scale. We 
believe that is because the minimal parallelism this circuit implements causes 
our rewriting engine to do much more work as compared to other multiplier 
structures. Even though memory use is not reported here, it scales the same way 
as timings, and it grows as big as 30 GB for the largest (1024 x 1024) circuits 
we have tested. 

Additionally, since integer multipliers are used to implement floating-point 
operations, we tested our method in a correctness proof for an implementation 
of a floating-point multiply-add instruction for Centaur Technology, and we got 
similar results. 


7 Related Work and Conclusion 


Having described our method, we now compare it with the related work. Well- 
known methods to verify multipliers include generic reasoning methods such 
as BDDs and SAT solvers. However, these tools do not scale well with large 
multipliers. For the last few years, efforts to verify large integer multipli- 
ers have explored the symbolic computer algebra approach based on Grébner 
basis [7, 16,22, 23,28,37]. As far as we are aware, all these tools are stand-alone, 
unverified C programs and none of them except D. Kaufmann et al. [16] pro- 
duces certificates. The soundness and completeness of this approach is shown 
only in theory [17]. We compared our method to the studies with the best tim- 
ing performance [16,23]. The tools implementing these methods identify adder 
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components in designs automatically and perform some rewriting. Their rewrit- 
ing strategy is different than ours; their method does not rely on maintained 
design hierarchy and separate reasoning of adder and multiplier modules. Even 
though they provide a more automatic system, their application appears to be 
limited to some known patterns. Additionally, our tool is implemented on an 
interactive tool, which can enable users to carry out more complicated proofs 
such as the correctness of floating-point circuits. The limitation of our method 
is that it relies on maintaining circuit hierarchy. Should this pose a problem for 
some designs, it might be possible for our method to be adapted in the future 
to work with flattened modules and identify adder components similarly to the 
related work. 

When a proof fails for a multiplier design, our tool does not output a user- 
friendly message. We will work to improve our tool to process the resulting 
terms from failed verification attempts and generate counterexamples for incor- 
rect designs. 

In this paper, we have presented an efficient method with a proven tool to 
verify large and complex integer multipliers. With maintained circuit hierar- 
chy, we can automatically verify very irregular multiplier designs; for example, 
various 1024 x 1024 Wallace-tree like multipliers can be verified in less than 
10min. We believe that our tool can find broader applications because it can 
be extended to verify circuits, such as floating-point multipliers, that include an 
integer multiplier as a submodule. 


Acknowledgments. We would like to thank the reviewers for their feedback, and 
Matt Kaufmann for his helpful directives when implementing this method in ACL2. 
This material is based upon work supported in part by DARPA under Contract No. 
FA8650- 17-1-7704. A part of this work was completed while M. Temel was working at 
Centaur Technology. 
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Abstract. We present a new semantic gate extraction technique for 
propositional formulas based on interpolation. While known gate detec- 
tion methods are incomplete and rely on pattern matching or simple 
semantic conditions, this approach can detect any definition entailed by 
an input formula. 

As an application, we consider the problem of computing unique strat- 
egy functions from Quantified Boolean Formulas (QBFs) and Depen- 
dency Quantified Boolean Formulas (DQBFs). Experiments with a pro- 
totype implementation demonstrate that functions can be efficiently 
extracted from formulas in standard benchmark sets, and that many 
of these definitions remain undetected by syntactic gate detection. 

We turn this into a preprocessing technique by substituting unique 
strategy functions for input variables and test solver performance on the 
resulting instances. Compared to syntactic gate detection, we see a sig- 
nificant increase in the number of solved QBF instances, as well as a 
modest increase for DQBF instances. 


1 Introduction 


Due to the effectiveness of modern satisfiability (SAT) solvers [20], propositional 
logic has become the language of choice for encoding hard combinatorial prob- 
lems arising in areas such as electronic design automation [50] and AI planning. 
Since many of these problems are hard for levels of the polynomial hierarchy 
beyond NP, their propositional encodings can be exponentially larger than their 
original descriptions. This imposes a limit on the problem instances that can 
be feasibly solved even with extremely efficient SAT solvers, and has prompted 
research on decision procedures for more succinct logical formalisms such as 
Quantified Boolean Formulas (QBFs). 

Quantified Boolean Formulas (QBFs) are propositional formulas combined 
with universal and existential quantification over truth values and offer much 
more succinct encodings of problems from domains such as planning and syn- 
thesis [12]. At the same time, QBF evaluation is PSPACE-complete, and in spite 
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of substantial progress in solver technology, many practically relevant instances 
remain hard to solve. 

In part, this hardness appears to be a matter of encoding. The most com- 
monly used format for QBFs is Prenex Conjunctive Normal Form (PCNF). A 
PCNF formula consists of a quantifier prefix and a matrix in conjunctive normal 
form. As in the case of propositional logic, any QBF can be converted to PCNF 
with linear overhead but this transformation is known to adversely affect solver 
performance [1]. This appears to be due to two issues: First, conversion to CNF 
causes a bias towards reasoning about unsatisfiability while making it difficult to 
reason about solutions, violating the inherent duality of QBF solving. Second, 
prenexing introduces spurious variable dependencies that needlessly constrain 
solvers [5,40]. In light of these issues, researchers have introduced two new for- 
mats for representing non-CNF (and even non-prenex) QBF's in the QCIR [30] 
and QAIGER standards, and solvers supporting these standards have been devel- 
oped. When only a PCNF encoding is available, gate extraction techniques can 
be used to (re)construct a non-CNF QBF [21]. Syntactic gate extraction relies 
on the detection of patterns of clauses and auxiliary variables introduced when 
converting a propositional formula to CNF [16]. The corresponding algorithms 
are fast but incomplete and can only detect definitions from a pre-defined library 
of gates. 

In this paper, we introduce a new semantic gate extraction technique based 
on SAT solving and interpolation. In contrast to known approaches, this method 
is complete: a definition w of a variable x can be extracted from a propositional 
formula y whenever the equivalence x = w is entailed by y. We obtain this 
result as a generalization of recent work that leverages definability for propo- 
sitional model counting [25,33]. Owing to a result known as Padoa’s Theorem, 
determining whether a variable x is definable in terms of X is in coNP and can be 
decided by a SAT call [33]. We show that a definition w of x in terms of X can be 
obtained as an interpolant of the formula passed to the SAT solver (Theorem 2). 
For SAT solvers that use a proof system with feasible interpolation—in particu- 
lar, CDCL solvers that generate resolution proofs [32]—this means a definition 
can be efficiently extracted from a proof of definability. 

We apply this new gate extraction technique to identify unique strategy func- 
tions of QBFs and Dependency QBFs. In a controller synthesis setting, a variable 
with a unique strategy function corresponds to a control signal with a unique 
(as a Boolean function) implementation. We can add such an implementation to 
the specification without affecting the remaining control signals. 

Experiments with a prototype show that definitions can be efficiently com- 
puted for formulas from standard QBF benchmark sets, and that for many 
instances a large fraction of variables have unique strategy functions that can- 
not be identified by syntactic gate detection. We further test the performance of 
solvers on instances obtained by replacing input variables with their definitions. 
For 2QBF formulas and PCNF formulas, this significantly increases the number 
of instances solved by some systems compared to purely syntactic gate extrac- 
tion. Our experiments further show that semantic gate detection is orthogonal 
to techniques implemented in state-of-the-art preprocessors. 
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Semantic gate detection is efficient and conceptually simple. By definition, 
it preserves logical equivalence and is compatible with strategy extraction. As 
such, we believe it is an essential addition to the state of the art in preprocessing 
(D)QBF. 


2 Preliminaries 


We assume a countably infinite set V of propositional variables and consider 
propositional formulas constructed from V using the connectives — (negation), 
A (conjunction), V (disjunction), — (implication), and + (the biconditional). 
For a propositional formula y, we write var(y) to denote the set of variables 
occurring in y. A literal is a variable v or a negated variable ~v. A clause 
is a finite disjunction of literals. A clause is tautological if it contains both v 
and ~v for some variable v. A propositional formula is in conjunctive normal form 
(CNF) if it is a finite conjunction of non-tautological clauses. An assignment of 
a subset X C V of variables is a function that maps X to the set {0,1} of truth 
values. For a set X of variables we let [X] denote the set of assignments of X. Two 
assignments o : X — {0,1} and 7 : Y — {0,1} agree on a subset W C XNY of 
their common domain if o(w) = T(w) for each w € W. For two assignments co : 
X — {0,1} and 7 : Y — {0,1} that agree on the entire intersection of their 
domains we define the combined assignment oUt : XUY — {0,1} as (oUT)(v) = 
o(v) if v € X and (ø UT)(v) = T(v) otherwise. 

For a propositional formula y and an assignment T : X — {0,1} with 
var(p) C X, we let y[7] denote the truth value obtained by evaluating y under T. 
The formula y is satisfied by T if y[r] = 1. In this case we call r a satisfying 
assignment of p. Otherwise, if y[7] = 0, formula y is falsified by 7. A formula 
is satisfiable if it has a satisfiable assignment, otherwise it is unsatisfiable. A 
formula y implies a formula w if p A aw is unsatisfiable. 

We consider Quantified Boolean Formulas (QBFs) in Prenex Normal Form 
(PNF). A QBF © = Q.y in PNF consists of a quantifier prefix Q and a 
propositional formula y, called the matrix of P. The quantifier prefix is a 
sequence Qızı... Qnn where Q; € {V,4} and the z; are pairwise distinct 
variables for 1 < i < n. The quantifier prefix defines an ordering <ẹ on its 
variables as x; <e £j for 1 <i < j < n. We assume that QBFs do not contain 
free variables and every variable in the quantifier prefix appears in the matrix, 
formally {11,...,%,} = var(y). Accordingly, we write var(®) = var(y) for the 
set of variables appearing in the QBF ®. We further assume that every variable 
of ® occurs exactly once in its quantified prefix. The set of existential variables 
of Ð is vara(®) = { x; | 1 < i < n,Q; = I}, and the set of universal variables 
of @ is vary(®) = {x | 1 < i < n,Q; = V}. For a variable x € var(®), we 
let typeg(x) = Q if x € varg(), for Q € {V, 3}, omitting & from the subscript 
if the QBF is understood. 

Let P a QBF and let x € var(®) be one of its variables with type(x) = Q. 
A strategy function for x is a function f : [var(®) \ varg(®)| — {0,1} such 
that f(T) = f(7’) for any two assignments 7 and r’ that agree on variables in 


Interpolation-Based Semantic Gate Extraction 511 


{v € var(®) \ vareg(&) | v <s x}.' Given an indexed family F = {fr}xex of 
strategy functions such that X C varg(®@) for Q € {V,3}, the response of F 
to an assignment T : (var(®) \ varg(®)) — {0,1} is the assignment F(r) : 
X — {0,1} given by F(r)(x) = fz(7). An existential winning strategy (for D) 
is a family F = {fubuevara() Of strategy functions such that, for any universal 
assignment T : vary(®) — {0,1}, the assignment 7 U F(r) satisfies the matrix 
of P. Dually, a universal winning strategy (for ®) is a family F = { fu}uevary(e) of 
strategy functions such that, for any existential assignment ø : vara(®) — {0,1}, 
the assignment ø U F(c) falsifies the matrix. A QBF @ is true if there is an 
existential winning strategy for ®, and false if there exists a universal winning 
strategy for ®. 


3 Semantic Gate Extraction by Interpolation 


This work builds on an application of propositional definability to the model 
counting problem [33]. We begin by recalling two basic concepts. 


Definition 1. Let p be a formula, let X be a subset of its variables, and let x 
be a variable. Variable x is defined in terms of X in ọ if o(x) = r(x) for any 
two satisfying assignments o and T of y that agree on X. A definition of x by X 
in p is a formula y with var(w) C X such that o(x) = po] for any satisfying 
assignment o of yp. 


It is readily verified that there is a definition for every variable that is defined. 
Lagniez et al. [33] observe that the following result can be used to determine 
whether a variable is defined [34,39]. 


Theorem 1 (Padoa’s Theorem). Let y be a formula and let X C var(y) be 
a subset of its variables. Let y’ be the propositional formula obtained by replacing 
every variable y € var(y)\X by a new variable y'. Let x € var(y) be a variable. 
Ifa ¢ X, then x is defined in y by X if, and only if, the formula p Az Ag’ Ana’ 
is unsatisfiable. 


For the purposes of preprocessing in model counting, it is sufficient to know that 
a variable x is defined by X in y, and the above result shows that this can 
be decided by a SAT solver. It is not necessary to compute the corresponding 
definition, whose size is not polynomially bounded in the size of p under common 
assumptions in computational complexity [33]. 

While finding definitions is harder than deciding definability in theory, the 
difference virtually disappears in practice. Our main theoretical contribution, 
stated as Theorem 2 below, says that a definition can be obtained as an inter- 
polant of the formula constructed in the statement of Padoa’s Theorem. Since 
interpolants can be efficiently (in linear time) generated from resolution proofs 
[22,32], the distinction between detecting definability and computing definitions 


1 We sometimes refer to existential strategy functions as Skolem functions and uni- 
versal strategy functions as Herbrand functions. 
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becomes moot when a CDCL SAT solver is used to decide (un)satisfiability: 
once it determines that the formula is unsatisfiable it has already (implicitly or 
explicitly) produced a proof from which a definition can be extracted at a small 
overhead.” 

Before proving Theorem 2, we recall the definition of an interpolant following 
McMillan [36]. 


Definition 2 (Interpolant). Let w and x be an formulas such that Y ^ x is 
unsatisfiable. An interpolant for Y and x is a formula I such that 


(1) w implies I, 
(2) IN x is unsatisfiable, and 
(3) I only refers to variables common to w and x. 


Craig’s Interpolation Theorem [9] states that every pair of jointly unsatisfiable 
propositional formulas have an interpolant.? It remains to show that an inter- 
polant for a formula witnessing definability in fact yields a definition. 


Lemma 1. Let p be a formula and let X C var(y) be a subset of its variables. 
Let vy’ be the formula obtained by replacing every variable y € var(p) \ X by 
a new variable y'. For any variable x € var(y) \ X, an interpolant for p \ x 
and y! Ana’ is a definition of x by X in ọ. 


Proof. Let I be an interpolant for yA x and y’ A 72’. By property (3) of Defi- 
nition 2, I only refers to the common variables var(y A x) N var(y! A 7a’) = X 
of these formulas. To see that J defines x in y, consider a satisfying assign- 
ment o : var(y) — {0,1} of y. If o(x) = 1 then ọ A z is satisfied by ø. The 
formula yA x implies I by property (1), so I[a] = 1 as well. Otherwise, o (x) = 0 
and we can construct a satisfying assignment o’ of y’A72’ by setting o’(v) = o(v) 
for v € X along with o’(v’) = o (v) for v € var(y)\X. By property (2), [Ay! Ana’ 
is unsatisfiable, so we must have I[o’|] = I[a] = 0. 


Theorem 2. Let p be a formula and let X C var(y) be a subset of its variables. 
Let y' be the formula obtained by replacing every variable y € var(y) \ X by a 
new variable y'. A variable x € var(y) \ X is defined in terms of X in ọ if, and 
only if, the formula p ^x Ap Ana’ is unsatisfiable, and a definition of x in 
terms of X can be obtained as an interpolant for p Ax and gy! A a2’. 


Proof. By Theorem 1 variable x € var(y) \ X is defined in terms of X in if, 
and only if, the formula y A x A vy’ A 72’ is unsatisfiable. Craig’s Interpolation 
Theorem tells us that in this case there is an interpolant for pA a and vy! Anz’, 
which defines x in terms of X by Lemma 1. 


? Assuming the SAT solver does not use the full power of the DRAT proof system [51]. 
3 In fact, the result holds even for first order logic, but we will confine ourselves to the 
propositional case. 
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4 Extracting Unique QBF Strategy Functions 


In this section, we show how Theorem 2 can be used to extract unique strategy 
functions of QBFs. We say that the Skolem (Herbrand) function of an existential 
(universal) variable x in a QBF is unique if it is the same in every existential 
(universal) winning strategy. In particular, if x is existentially (universally) quan- 
tified and the formula is false (true), then the strategy function of x is trivially 
unique (there is none). In other words, the strategy function of a variable x 
is unique if there is at most one such function for x that is part of a winning 
strategy. The following result states that propositional definability is a sufficient 
condition for uniqueness of a strategy function. 


Proposition 1. Let ® = Qızı... Qn£n-p be a QBF. If an existential (univer- 
sal) variable x; is defined in terms of variables X C{x;|1<j<i,Q; Z Qi} 
in p (~~) its Skolem (Herbrand) function is unique. 


Proof. We only consider the case where 2; is an existential variable of ® (the 
case where z; is a universal variable is symmetric). Let F = { fz; }2;evars() and 
G = {9x,;}x;evar3(6) be existential winning strategies and 7 : vary(®) — {0,1} 
an assignment to the universal variables. Since F and G are existential winning 
strategies both op = TU F(T) and og = TUG(r) must be satisfying assignments 
of y. The assignments or and og agree on X C vary(®), so we must have 
fe,(7) = ap (zi) = oalzi) = Gx,(T) because x; is defined in terms of X. Since T 
was chosen arbitrarily, this identity holds for every universal assignment, so the 
functions fr; and gx, coincide. 


To see that definability is not a necessary condition for a strategy function to 
be unique, consider the following example. 


Example 1. Let ® = VadyVz(a > y) V z. The formula ~ = «x represents the 
unique existential winning strategy (set y to the same value as x). However, 
variable y is not defined in terms of x: the assignments {x,y,z} and {x,-7y, z} 
both satisfy the matrix and agree on x, but differ on y. Intuitively, the reason 
why the existential strategy function for y is unique in spite of y not being 
defined is that the universal player would never assign z true as required by one 
of the assignments witnessing non-definability. 


4.1 An Algorithm for Computing Unique Strategy Functions 


We now describe an algorithm for computing unique strategy functions of a QBF 
based on Proposition 1. By using an interpolating SAT solver (ITPSATSOLVER) 
that supports both incremental solving and assumptions [22], we can extract 
definitions for variables of a given quantifier type (universal or existential) using 
a single solver instance. Pseudocode is shown as Algorithm 1 below. 

Let ®@ = Qızı... Qnn- be a QBF and let Q € {V,4} be a quantifier 
type. Algorithm 1 first determines the leftmost variable x; in the prefix of & 
that has quantifier type Q (line 3). The strategy function of any variable to the 
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right of x; in the prefix (including 2; itself) may use the variables to its left 
(shared), so we can begin by looking for definitions of x; in terms of shared. 
Towards constructing the formula for the corresponding unsatisfiability check 
according to Theorem 2, COPy(y, X) returns a copy y’ of the matrix p where 
each variable x € var(y) \ shared has been replaced by a fresh variable x’. Next 
(lines 9-14), we consider each variable x; with quantifier type Q—these are 
the variables we want to find definitions of—and introduce two fresh “selector” 
variables s; and sj, while adding clauses (=s; V xj) and (sj, V 7a’;) to y and ¢’, 
respectively. These clauses allow us to represent y A xj A p’ Angr} by assuming 
literals sj and si,4 

After initializing the SAT solver, we consider the variables 71,...,2,, in the 
order of the quantifier prefix (lines 18-29). If variable x; has quantifier type Q, 
we want to check whether x; is defined in ọ in terms of oppositely quantified 
variables X; that precede it in the prefix (Proposition 1 tells us that in this 
case the strategy function of x; is unique). For the first such variable zj, it is 
clear that the set of variables common to y and y’ is precisely X. Unsatisfiability 
of pAxj Ap’ Ana’, is decided by calling the SAT solver under assumptions {5;, $4}: 
the assumptions ensure that x; and =r) are set to true by propagation, and all 
remaining selector variables can be set to false so as to satisfy the clauses they 
occur in without interfering with the remaining clauses. If the solver determines 
unsatisfiability, an interpolant I; is computed (line 22), which by Theorem 2 cor- 
responds to a definition of zj, and adds the pair (xj, I;) to a list of definitions. 
Otherwise, if x; has the quantifier type opposite to Q, the strategy function 
of any variable with quantifier type Q considered later may use xj. Accord- 
ingly (lines 26-27), we add clauses (xj; V =x4) and (~z; V x) to y’ through the 
incremental interface of the SAT solver. This has two effects: first, it enforces 
equivalence of x; and Ti» and second, x; is added to the common vocabulary 
of y and y’, so that it can appear in interpolants computed in later iterations.” 

Soundness of Algorithm 1 as stated in the following proposition can be proved 
by a straightforward induction on the quantifier prefix using Theorem 2 and 
Proposition 1. 


Proposition 2. Given a quantified Boolean formula ® and a quantifier type Q € 
{V, 3}, Algorithm 1 terminates with a (possibly empty) set { (x1, l1)... (£k, Ik) } 
of pairs (xi, Ii) such that I; represents the unique strategy function of x; in ® 
and var(x;) E€ varg(®) for 1 <i<k. 


Example 2. Consider the QBF W = Va syiVre2dyo.y~, where 


g = (z1 V y1) A (a1 V ny1) A (z2 V y2) A (“z2 V 7Yy2). 


4 Two distinct selector variables are required to ensure that they do not belong to the 
common variables of y and g”. 

5 One could also add these clauses to y, in which case x’, would become part of the 
shared vocabulary. This has the slight disadvantage that subsequently computed 
definitions may use a mixture of variables from y and y’, rather than just ¢. 
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Algorithm 1. Extracting Unique Strategy Functions by Interpolation 


1: procedure GETDEFINITIONSQBF(@, Q € {V,3}) 
2: Qızı... QnEn.p — P 

3 i=min{l<i<n|Q:=Q} 

4: shared — {x1,...,2:-1} 

5: if Q = V then 

6: p = =ne > V-strategies aim to falsify the matrix. 
7 end if 

8: yp’ — coPY(y, shared) 

9: sametype —{j|1<j<nandQ;=Q} 

10: for j € sametype do 


11: sj, 8), — fresh variables 

12: p = pA (ns; V £j) 

13: g = p A (78; V 724) 

14: end for 

15: solver — ITPSATSOLVER(Q, y’) 


16: defined — 0 
17: k—mar{i<k<n|Qr=Q} 
18: for j =1,...,k do 


19: if Q; = Q then 

20: result — solver.SOLVE({8;, 8; }) 

21: if result = UNSAT then 

22: I; — solver.GETINTERPOLANT() 
23: defined — defined U {(xj, I;)} 
24: end if 

25: else DQ ZQ 
26: solver.ADDCLAUSE(¢’, £j V 72) 
27: solver.ADDCLAUSE(y’, =£; V £4) 
28: end if 

29: end for 


30: return defined 
31: end procedure 


We illustrate a run of Algorithm 1 on ¥ with Q = J. Since yı is the leftmost 
existential variable, we create a copy y’ of p with every variable except xı 
renamed, that is, 


p’ = (a1 V y1) A (azı V nyi) A (z3 V y2) A (aza V aya). 


We also add the clauses (45; V y1) and (~s2V y2) to y and the clauses (784 V ~y) 
and (3s, V7y}) to y’. In the main loop, Algorithm 1 first checks whether pA y’ 
is unsatisfiable under the assumptions {s1, s1}. Unit propagation simplifies ọ to 
(omitting unused selector variables and clauses) 


(721) A (z2 V y2) A (“z2 V 72), 
and y’ simplifies to 


(z1) A (ay V y2) A (may V 79). 
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By resolving (421) with (21) we obtain the empty clause, and =x; is the corre- 
sponding interpolant,° so (y,,721) is added to the set of definitions. Next, we 
consider the universally quantified variable xə and add the clauses (#2 V 72) 
and (722 V #5) to y’. Finally, we check whether y2 is definable by calling the 
SAT solver under the assumptions {s2, 54}. Now, the formula y simplifies to 


(z1 V y1) A (5x1 V 71) A (22), 
and y’ simplifies to 

(x1 V yh) A (521 V =y) A 

(£3) A (£2 V 7x5) A (A@2 V x3). 
Unit propagation derives the clause (x2) from the clauses in the second line, 
which can be resolved with the clause (~z2) from ọ to obtain a resolution refu- 
tation of the formula y A y’, with 7x2 as an interpolant. Accordingly, (y2, 722) 
is added to the set of definitions. Algorithm 1 terminates with the definitions 


{(y1, 71), (Y2, 7@2)}, and it is readily verified that yı = 721, yo = 7X2 is indeed 
the unique existential winning strategy of Y. 


4.2 Improvements and Generalization to Dependency QBF 


Consider a QBF @ = V2, v2 Jy, y2.(£1 22) > (yı > y2). It is easy to verify 
that ® is true and that yı and y2 do not have unique Skolem functions: for every 
assignment to the universal variables there are two ways of setting yı and y2 so 
as to satisfy the matrix, so neither existential variable is defined by the universal 
variables alone. However, each variable is defined by all remaining variables. For 
instance, variable y2 is defined by x1, x2, and yı. 

More generally, increasing the set of defining variables allows us to detect 
more definitions: if x is defined in terms of X then it is also defined in terms 
of any enclosing set X’ > X. To exploit this, we modified Algorithm 1 so as 
to assume a total ordering of variables and check for definitions of a variable x 
in terms of all variables X which precede it in the quantifier prefix. This can 
be implemented by simply adding clauses encoding equivalence of x; and zi, 
(lines 26-27) regardless of quantifier type. 

Technically, this leads to an alternative definition of a “winning strategy” 
for a QBF where each strategy function takes an assignment to all preceding 
variables as input. Both definitions are ultimately equivalent in the sense that 
a winning strategy according to one definition can be transformed into a win- 
ning strategy according to the other definition without changing its responses 
(cf. the work on quantifier elimination by functional composition and self- 
substitution [8,14,28,29]). One can prove an analogue of Proposition 1 stating 
that the strategy function—according to the alternative definition—of a vari- 
able x is unique whenever x is defined in terms of the variables preceding x in 
the quantifier prefix. 


6 As mentioned above, interpolants can be efficiently extracted from resolution refu- 
tations [32,36, 46]. 
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Dependency Quantified Boolean Formulas (DQBFs) generalize QBFs by 
allowing a non-linear quantifier prefix. More specifically, each existential variable 
is annotated with a set of universal variables its Skolem function may depend 
on. A DQBF is true if there is an existential winning strategy such that each 
Skolem function satisfies these restrictions [2]. Although evaluating DQBF is 
NEXPTIME-complete and thus believed to be much harder than evaluating 
QBF, the fact that problems can be concisely encoded in DQBF [12,18] has 
prompted the development of dedicated DQBF solvers [13, 15, 17,48]. 

Algorithm 1 can easily be extended to compute unique Skolem functions of 
DQBF. The standard DQDIMACS format [15] allows for the combination of a 
linear quantifier prefix with variables for which the dependency sets are explicitly 
stated. The linear quantifier prefix can be handled as before. For each existential 
variable x with explicit dependency set D, we simply check whether x is defined 
by Dy. If multiple variables 71,..., £p have the same dependency set Dy (which 
is frequently the case in benchmark formulas) we check whether x; is defined 
by Dy U {a1,...,v;-1} for each 1 < i < k. Again, this technically requires a 
non-standard definition of Skolem functions for DQBF but can easily be proven 
sound. 


5 Implementation 


We implemented the algorithm described in the previous section in a prototype 
named UNIQUE. As a back end SAT solver we use ITPMINISAT, a modified ver- 
sion of MINISAT [11] bundled with the ExTAvy model checker that efficiently 
generates interpolants in memory and supports both assumptions and incre- 
mental solving [22,49]. UNIQUE can read PCNF formulas (QDIMACS), prenex 
non-CNF QBFs (QCIR), as well as DQBFs with CNF matrices (DQDIMACS). 

Interpolants obtained from ITPMINISAT are represented as And-Inverter 
graphs (AIGs) and accessed through the AIG library of ABC [7]. To make use 
of the structural sharing capabilities of AIGs, we maintain a single AIG repre- 
senting the interpolants computed in the main loop (lines 18-29) of Algorithm 1. 
Whenever a new interpolant is obtained, the corresponding AIG returned by ITP- 
MInISAT is merged into the existing AIG. If the number of AIG nodes exceeds 
a (geometrically increasing) threshold, we use the ABC macro compress2 to 
reduce the size of the combined AIG. Upon termination, and assuming the AIG 
is not too large, this is followed up by a round of FRAIGing [37] and a final 
application of compress2. 

While running UNIQUE on QBFs with multiple quantifier alternations we 
noticed that ITPMINISAT got stuck attempting to solve some of the definabil- 
ity queries. Further testing revealed that the corresponding instances were hard 
for most state-of-the-art solvers. Increasing the overall timeout would allow us 
to solve these instances in some cases, but naturally the corresponding inter- 
polants (for unsatisfiable instances) were very large (and difficult to compress 
with ABC). This clearly defeats the purpose of detecting unique strategy func- 
tions quickly. We thus decided to impose a limit on the number of conflicts 
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for each call of IrPMINISAT (currently set to 1000 conflicts). This significantly 
reduces the overall running time of UNIQUE for many instances and ensures that 
individual interpolants are small, but only marginally decreases the total number 
of definitions found. 

Since the individual definability queries are independent of each other, it is 
not necessary to determine for each input variable whether it is defined. Accord- 
ingly, we implemented UNIQUE as an anytime algorithm: upon termination, it 
returns the set of variables with unique strategy functions identified up to that 
point, along with the AIG representing the corresponding functions. 


6 Experiments 


For the experiments described below we used a cluster with Intel Xeon E5649 
processors at 2.53 GHz running 64-bit Linux. 


6.1 Gate Extraction 


We first ran UNIQUE to compute unique strategy functions for the instances in 
the 2QBF (402 instances) benchmark set from the 2018 QBF Evaluation, as 
well as the PCNF (558), QCIR (341), and DQBF (333) benchmark sets from 
the 2019 QBF Evaluation.” For each job we imposed a time limit of 600s and a 
memory limit of 1.8 GB. 
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Fig. 1. Running time (s) of UNIQUE by benchmark set. For each 50-s interval within 
the time limit (x-axis), the number of instances (y-axis) processed by UNIQUE with a 
running time in that interval is shown. 


Figure 1 shows a histogram for the running time of UNIQUE on different 
benchmark sets. While most instances are processed quickly, UNIQUE runs into 


T http: //www.qbflib.org. 
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the time limit for a significant number of PCNF instances. Generally, the running 
time increases with the size of the matrix and the number of variables. This 
explains why almost all DQBF formulas are processed quickly, as these tend to 
be much smaller compared to formulas from the other benchmark sets. 

Figure 2 shows a histogram for the fraction of existential variables with unique 
strategy functions in 2QBF and PCNF instances (turquoise bars). We clearly 
see a bimodal distribution here: there is a large number of instances where the 
strategy functions of most variables are unique, but also a significant number 
of instances where few existential strategy functions are unique. To determine 
how many of the corresponding definitions cannot be found by syntactic gate 
detection, we used the QCIR-CONV script provided by GHOSTQ [31] to convert 
2QBF and PCNF instances to QCIR, and ran UNIQUE again on the resulting cir- 
cuits. To do this, the circuit is translated (back) to CNF, but auxiliary variables 
representing gates are ignored by the definability check. Testing showed that a 
one-sided CNF encoding [42] works better than standard Tseitin conversion. 


2QBF PCNF 
200 = 
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> E ah l 
oil =o_oslia Ll. 
0. 25 0. 50 0. 75 


Fig. 2. Fraction of existential variables with unique strategy functions in 2QBF (left) 
and PCNF (right) instances before (turquoise) and after (red) syntactic gate detection. 
For each fraction (x-axis) we see the number of instances (y-axis) with the correspond- 
ing fraction of unique existential strategy functions. (Color figure online) 


Table 1 (left) shows quartiles for the distributions of unique existential strat- 
egy functions detected by UNIQUE in each benchmark set.8 We only show the 
distribution for existential variables in Table1l and Fig.2 since very few uni- 
versal variables were found to have unique strategy functions. In fact, only 51 
instances from the QCIR benchmark set encoding bounded synthesis for Petri 
games contained such universal variables. 

The fraction of variables with unique strategy functions was smallest for 
QCIR instances. This is expected, since they can represent circuit structure 
directly and do not require auxiliary variables to encode gate definitions. By 


8 For instance, the left side of the first row of Table 1 says that for 75% of 2QBF 
instances, UNIQUE was able to identify 3% of Skolem functions as unique; for half of 
the instances, at least 90% of existential variables were identified as having unique 
Skolem functions; and for 25% of instances, at least 96%. 
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Table 1. Distribution (quartiles) of the fraction of unique Skolem functions identified 
by UNIQUE before (left) and after (right) preprocessing with HQSPRE. Rows marked 
by a star (*) show the distribution after syntactic gate detection. 


Original Preprocessed 
lst | Median | 3rd | 1st | Median | 3rd 


2QBF | 0.03 | 0.9 0.96/10 JO 0 
2QBF* |0 0.22 0.54/0 |0 0 
PCNF |0 0.53 0.94/0 JO 0.03 
PCNF* | 0 0.21 0.53/0 |0 0.02 


QCIR |0 0 0.13;-— |- - 
DQBF | 0.57 | 0.88 0.94/0 | 0.22 0.45 


contrast, 2QBF and DQBF instances contain many variables with unique strat- 
egy functions. For about half of the instances, between roughly 90% and 95% of 
the existential strategy functions are unique. 

On the right of Table 1 we show the distribution of unique existential strategy 
functions after preprocessing with HQSPRE [52]. Clearly, only very few unique 
Skolem functions are detected by UNIQUE. This may be in part due to the fact 
that preprocessing detects and removes gate definitions [27]. Another possibil- 
ity is that definitions are simply lost: some of the most powerful preprocessing 
techniques for QBF currently used only preserve the truth value and not the set 
of strategies [23]. We will return to this topic at the end of the next subsection. 


6.2 Solving Formulas Augmented with Definitions 


Unique strategy functions of a (D)QBF can be substituted for their variables 
without changing the set of winning strategies. This can be used in preprocessing 
to reduce the number of quantified variables, typically at the cost of increasing 
the size of the matrix. In the following experiments, we substituted definitions 
found by UNIQUE for the defined variables and ran QBF and DQBF solvers on 
the resulting instances. 

First, we considered the 2QBF benchmark set. We picked the QCIR solvers 
Quass [47], QFUN [26], and GHOSTQ [81], along with the dedicated 2QBF 
(PCNF) solver CADET [43]. For the QCIR solvers, the performance on 
instances constructed by syntactic gate detection with QCIR-CONV serves 
as a baseline. We compare it with performance on instances obtained by 
UNIQUE and—since QCIR-CONV also performs circuit-level simplifications that 
go beyond gate extraction—with a combination of both where QCIR-coNvV and 
UNIQUE are run in sequence. 

For CADET, we compare performance on the original 2QBF instances with 
performance on QDIMACS instances augmented with CNF encodings of defi- 
nitions extracted by UNIQUE. For each configuration, we report the number of 
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instances solved within a time limit of 15 min. To isolate the effect of adding defi- 
nitions, the time required by UNIQUE (and QCIR-CONV) is not counted towards 
the time limit.° The results are shown in Fig. 3 (left). 


Gate Detection E QCIR-Conv E] Unique E] Both E] None 


Original Preprocessed 
200 - 
TE 
QFun QuAbS GhostQ CADET QFun QuAbS GhostQ CADET 


Fig. 3. Number of 2QBF instances solved (y-axis) by solvers (x-axis) using different 
gate detection methods before (left) and after (right) preprocessing with HQSPRE. 


QFUN, QUABS, and GHOSTQ benefit considerably from semantic gate extrac- 
tion, in particular when applied on top of syntactic gate extraction. By contrast, 
CADET solves fewer instances augmented with gate definitions than original 
instances. We found this surprising, since variable definitions should be detected 
by CADET’s heuristic for identifying unique Skolem functions. Perhaps most 
definitions found by UNIQUE are already covered in this way, so that the addi- 
tional clauses simply slow down propagation. We believe that explicitly telling 
CADET which variables have already been identified as determined should 
result in a speedup overall. 

Figure 4 takes a closer look at solving times for individual instances (for this 
plot, memory outs are treated as timeouts). CADET is slower on instances 
augmented by UNIQUE but fairly consistent, while the effect on the other solvers 
is more erratic. We conjecture that this is because the set of existential strategies 
is preserved and the instances thus “look similar” to CADET. 

Next, we tested with PCNF instances and considered the QDIMACS solvers 
DEPQBF [5] and CAQE [44], as well as the QCIR solvers Quass [47], 
QFuN [26], and QUTE [40]. Again, we compare the number of instances solved 
in 15min with different options for gate detection. Results are shown in Fig.5 
(left). Again all QCIR solvers benefit from gate detection with UNIQUE when per- 
formed on top of syntactic gate detection with QCIR-Conv, while performance 


° The results are qualitatively the same when the running time of UNIQUE is counted 
towards the time limit: the largest decrease in the number of solved instances across 
all benchmark sets and configurations is 7. 
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Fig. 4. Solving time (s) for 2QBF instances with (x-axis) and without UNIQUE (y-axis). 


decreases for both QDIMACS solvers. The additional clauses and variables intro- 
duced by UNIQUE apparently do not help these solvers and simply result in a 
slowdown. 

Finally, we tested the impact of UNIQUE on DQBF (DQDIMACS) instances 
solved by HQS [19] and DCAQE [48] within 15 min. Since DQBF solvers cur- 
rently do not (yet) support non-CNF input, we translate definitions to CNF and 
add them to the original formulas. Note that whenever an existential variable x 
is defined by (a subset of) its dependency set, we can safely let x depend on 
additional variables. This is sound since the response of variable x is already 
determined by the variables in the original dependency set and cannot change 
depending on other inputs. In particular, we can collect all defined variables (and 
auxiliary variables) in an “innermost” existential quantifier block that depends 
on all universal variables. Since many existential variables have uniquely deter- 
mined strategy functions (see Table 1), this allows us to push many variables 
into the innermost quantifier block and get closer to a linear quantifier prefix. 
For HQS, this translates into a small increase in the number of solved instances 
(208 vs. 189), whereas DCAQE basically solves the same number of instances 
(133 vs. 135). 


Interaction with Preprocessing. QBF solvers for PCNF are typically paired 
with preprocessors such as BLOQQER [6] or HQSPRE [52]. These are highly 
engineered tools that batter instances with a barrage of techniques and can 
often solve formulas completely on their own. Most solvers benefit greatly from 
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Fig. 5. PCNF instances solved (y-axis) by solver (x-axis) using different methods for 
gate detection before (left) and after (right) preprocessing with HQSPRE. 


preprocessing. This is evident in Fig. 5 (right), which shows the number of solved 
PCNF instances with different forms of gate detection after preprocessing with 
HQSPRE (within a timeout of 600s). Here, the number of solved instances 
increases significantly for almost all systems. 

At the same time, preprocessing appears to obscure or destroy definitions. 
UNIQUE hardly finds any definitions in preprocessed instances (cf. Table 1) and 
accordingly has little impact on performance. For QFUN, which benefitted most 
from gate detection in our experiments, this translates to a substantial reduction 
in the number of solved instances. On the 2QBF benchmark set (Fig.3), both 
QFUN and GHOSTQ solve significantly fewer instances with HQSPRE compared 
to the combination of UNIQUE and QCIR-Conv, whereas the number of solved 
instances almost doubles for QUABS. Understanding which preprocessing tech- 
niques obscure gate definitions and why certain solvers benefit more from gate 
detection than others are important questions for future work.° 


7 Related Work 


Our semantic gate detection technique is closely related to a method for deter- 
minizing Boolean relations by Jiang et al. [29], a problem that essentially corre- 
sponds to solving 2QBF. The authors show that, for a (total) relation R(X, y) 
with a single output variable y, a functional implementation of y can be obtained 
as an interpolant for ~R(X,0) A ~R(X,1). This can be used to determinize 


10 We also ran experiments with QCIR-conv and UNIQUE applied before preprocessing. 
The results were significantly worse, so we do not report them in detail. Standard 
preprocessing requires PCNF input, so that definitions have to be encoded using 
additional clauses and Tseitin variables. Just like the PCNF solvers in the other 
experiments, HQSPRE appears to be unable to do anything useful with these extra 
clauses and variables. 
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relations R(X,Y) with a set of output variables Y = {yi,...,yn}. First, an 
implementation fn for yn can be computed by treating R as a relation with 
inputs X U {y1,.--,Yn—1} and single output yn. Subsequently, the implemen- 
tation fn can be substituted for yn to obtain a relation R’(X,Y \ {yn}). By 
repeating this process, a functional implementation fı of yı can eventually be 
obtained. Substituting f; into fi+ı for 1 < i < n results in functional imple- 
mentations that only depend on the original input variables X. This approach 
does not require for any of the output variables to be defined by X, but an 
implementation of y; solely in terms of the input variables X is only available at 
the very end of this process. For deterministic relations R(X, Y) (where every y 
is defined in terms of X), the authors show that a functional implementation 
of y € Y can be obtained as the interpolant of a formula that corresponds to the 
formula in the statement of Padoa’s theorem. Our result stated as Theorem 2 is 
more general in that it holds for multi-output relations that are not necessarily 
deterministic. 

Hofferek et al. use interpolation to synthesize multiple functional implemen- 
tations from a single proof and thus avoid the increase in formula size incurred 
by repeated substitution [24]. This has an analogue in strategy extraction for 
QBF, which allows for implementations of all (existential or universal) variables 
to be obtained from a proof [3]. However, strategy extraction requires the input 
QBF has been solved, whereas our main interest is in preprocessing QBF. 

There is a series of works on recovering gate definitions from CNF formu- 
las. Li integrated rules for detecting equivalent literals in a Davis-Putnam style 
algorithm [35]. Ostrowski et al. represent formulas as graphs to detect patterns 
corresponding to and-gates, or-gates, and equivalences [38]. Roy et al. use CNF 
signatures to detect a richer set of gates [45]. Fu and Malik extend this to arbi- 
trary (user-specified) gate libraries and ensure that a maximum acyclic circuit 
is constructed [16]. 

In the context of QBF, Bacchus and Goultiaeva showed that circuit recon- 
struction can speed up solvers by providing them with a better set of initial 
cubes [21]. They also extended the scope of these techniques to CNF formulas 
obtained from circuits by the Plaisted-Greenbaum encoding [42]. Scholl and Pig- 
orsch developed a QBF solver that manipulates an AIG representation of the 
matrix to perform quantifier elimination and relies on circuit reconstruction to 
simplify the initial AIG [41]. 

Balabanov et al. proposed a SAT-based semantic gate extraction tech- 
nique [4]. Their approach has the disadvantage that a subset of clauses inducing 
a definition has to be guessed. As a more efficient heuristic, they suggest to 
identify pseudo definitions instead. A set of clauses (A; V £), ..., (Ap V £), (Bi V 
a2),...,(B, V =x) is a pseudo definition of x if the formula A; A++- A Az A 
Bı A+- A Bı is unsatisfiable. Rabe and Seshia use a similar criterion in their 
incremental determinization algorithm to identify variables that are (locally) 
deterministic [43]. Checking for pseudo definitions is typically efficient but lim- 
its the range of definitions that can be detected. 


Interpolation-Based Semantic Gate Extraction 525 


8 Conclusion 


Syntactic gate detection has been shown to benefit SAT solvers [10,16] and 
QBF solvers [21]. The underlying algorithms are fast but limited to a predefined 
library of gates. By contrast, our semantic gate extraction method can detect any 
definition entailed by an input formula but requires an interpolating SAT solver. 
In the context of SAT, this overhead likely outweighs any potential benefits. 
However—as demonstrated by our experiments—there is significant potential 
for application to harder problems such as QBF and DQBF evaluation. Here, 
preprocessing is just a first step. 

At the same time, our results show that substituting unique strategy func- 
tions can slow down solvers. In some sense, this is counter-intuitive: ideally, 
providing solvers with unique strategy functions should give them a head start, 
or at least not hurt their performance. By analogy, if we give a SAT solver part of 
a backbone assignment, it can simply instantiate accordingly and need not con- 
sider the corresponding variables for the remainder of its run. With the exception 
of CADET, QBF solvers currently cannot “instantiate” variables with strategy 
functions in this way, since they are only equipped to reason about assignments. 
We believe that designing techniques for reasoning about strategies is a key 
challenge in developing the next generation of QBF solvers. 


Acknowledgements. The author would like to thank Adrian Rebola-Pardo, Matthias 
Schlaipfer, and Georg Weissenbacher for helpful discussions. 
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Abstract. We present TARTAR, an automatic repair analysis tool that, 
given a timed diagnostic trace (TDT) obtained during the model check- 
ing of a timed automaton model, suggests possible syntactic repairs of the 
analyzed model. The suggested repairs include modified values for clock 
bounds in location invariants and transition guards, adding or removing 
clock resets, etc. The proposed repairs guarantee that the given TDT 
is no longer feasible in the repaired model, while preserving the overall 
functional behavior of the system. We give insights into the design and 
architecture of TARTAR, and show that it can successfully repair 69% 
of the seeded errors in system models taken from a diverse suite of case 
studies. 


1 Introduction 


A reactive system with requirements pertaining to its timing behavior is often 
modeled as a network of timed automata (NTA) [BY03]. Whether a timing 
requirement holds in an NTA can be analyzed by timed model checkers such 
as Uppaal [BLL+95] or opaal [DHJ+11]. In case of a requirement violation, a 
model checker returns a timed counterexample, also called a timed diagnostic 
trace (TDT). Until now, developers must manually identify and correct such 
violations by analyzing the generated TDTs. It is therefore desirable to support 
this process by an automated tool set that not only determines whether timing 
requirements are met, but also proposes syntactic repairs of the NTA in case 
they are not. 

In [KLW19] we presented an automated repair analysis that analyzes a TDT 
obtained from the violation of a timed safety property and returns syntactic 
repair suggestions that avoid the concrete executions of the TDT violating the 
property. The analysis performs an additional admissibility check ensuring that 
the repaired model is functionally equivalent with the original NTA, which means 
that no action traces are added or omitted by the repair. 

To illustrate the repair analysis consider the NTA in Figs. l(a) and (b). It 
describes a client that sends a request req to a database db and expects to receive 
a response ser within 4 time units after sending the request. The client contains a 
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Fig. 1. Network of timed automata - running example 


clock x that measures the time delay between the request creation and the receiv- 
ing of a response in location serReceiving. The NTA allows to execute a TDT 
that violates the property, illustrated as a sequence diagram with time intervals 
in Fig. 1(c). A time interval in the sequence diagram denotes the minimal and 
maximal time delay for the message transmission and processing times in db, 
respectively. The repair computation analyzes the TDT and produces several 
syntactic repairs to the NTA that avoid the property violation. In [KLW19], the 
computed repairs aim at the modification of clock bounds in location invariants 
and transition guards. An example of such a repair is to reduce the bound in the 
time constraint w < 2 from 2 to 1. The modified bound constrains the maximal 
transmit time of the req message so that the resulting NTA receives all responses 
within the expected time. This repair eliminates the problematic executions of 
the TDT in the original NTA without changing the functional behavior of the 
system, which is confirmed by an admissibility test defined in [KLW19]. How- 
ever, in general, it may not be possible to repair the model using only clock 
bound alterations. 


Contributions. We present TARTAR |[tar20], which extends the initial prototype 
implementation of the clock bound repair analysis presented in [KLW19] to a 
more comprehensive NTA repair tool. Specifically, the extended tool implements 
new analyses that can suggest a whole range of repairs in addition to clock 
bound variation, such as modifying comparison operators in constraints, clock 
references, clock resets, and location urgency. Examples of new repairs computed 
for the model in Fig. 1 are: 


— Exchanging the comparison operator in the constraint w > 1 to w < 1 ensures 
that the time to send a request is below 1 time unit. 

— An exchange of clock z in z < 2 with clock y restricts the time of processing 
and receiving the response to at most 2 time units. 

— To reset the clock y on the previous transition instead ensures that the time 
for sending and processing the request is below 1 time unit. 
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— Making the location serReceiving urgent reduces the time to receive a response 
to 0. 


We call a repair admissible if the repaired system is functionally equivalent to 
the unrepaired system. The repair analysis implemented in TARTAR returns the 
complete set of admissible repairs. 

The repair analysis combines concepts and algorithms from model checking, 
constraint solving, and automata theory. A real-time model checker is used to 
generate TDTs for a given NTA that violate a given timed safety property. TAR- 
TAR translates the TDT into a linear real arithmetic constraint system. An SMT 
solver is used to compute a repair for the generated constraint system by solv- 
ing a MaxSMT problem. An automata-based language equivalence test checks 
whether the repair is admissible in the NTA model. The collaboration between 
these subcomponents yields a complex tool architecture. We provide insights into 
the design and implementation of this architecture and the underlying infras- 
tructure of supporting tools. We evaluate the new repair analyses by applying 
TARTAR to a number of NTA models. We systematically inject different mod- 
ifications in these correct models and compute repairs for the obtained faulty 
models, which results in at least one admissible repair for 69% of the TDTs. 


Related Work. Other tools exist that compute repairs. The tool BugAs- 
sist [JM11] analyzes C-code by solving a MaxSMT problem. The tool 
ReAssert [DDG-+11] checks a set of possible modification to repair broken unit 
tests. Angelix [MYR16], S3 [LCL+17] and SemFix [NQRC13] compute repairs 
by symbolic execution and constraint solving. SketchFix [HZWK18] is based on 
lazy candidate generation. All tools are not repairing broken time constraints. 
We are not aware of related work on tools for the repair of timed automata 
models. A more comprehensive overview of related work on automated repair is 
given in [LPR19]. A discussion of work related to the foundations of our repair 
analysis can be found in [KLW19]. 


2 New Types of Repair Analyses 


The repair analysis presented in [KLW19] and implemented in the prototype 
version of ‘TARTAR encodes a TDT as a constraint system in linear real arith- 
metic. It computes syntactic correct modifications of the underlying NTA by 
introducing bound variation variables v. For example, possible bound modifica- 
tions for a clock bound x < 2 are expressed by a modified clock bound x < 2+. 
The repairs are computed by solving a partial SMT problem on the TDT con- 
straint system, involving soft-assert constraints on the bound variation variables. 
No repair is computed whenever the soft assertion v = 0 holds, otherwise the 
computed value of v characterizes the repair. In the following we sketch the new 
types of repairs implemented in TARTAR. For a more comprehensive description, 
which space limitations do not allow us to provide here, we refer to [KLW20]. 
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Operator Variation Repair Analysis. This analysis is motivated by the assump- 
tion that a wrong comparison operator in a location invariant or transition guard 
may cause a property violation. We assume for the repair encoding that the oper- 
ators ~ are indexed according to their order in the sequence ( <,<,=,>,>). 
The possible repairs are encoded by a fresh variation variable v?” where the 
value of vu?” is the index of the corresponding comparison operator. If x < 4 is 
computed as a repair, then v?” = 1. Using this repair analysis, TARTAR finds 
two admissible repairs for the example in Figs.1(a) and (b) that replace the 
comparison operator in the clock constraint w >= 1 by < or <=, respectively. 


Clock Reference Repair Analysis. This analysis aims to repair property violations 
resulting from errors that stem from the unintended use of a wrong clock variable. 
We enumerate all the positions of clock variables in clock bound constraints 
using index 7 and all clock variables using index k. We then introduce for every 
position 7, a fresh variation variable v£” whose value k indicates the clock ck to 
be used at that position in the repaired model. For example, if y < 2 is a repaired 
constraint, where the position of y in the constraint has index 3 and clock y has 
index 1, then vf” = 1. Applying this repair analysis to the examples in Figs. 1(a) 
and (b), TARTAR finds 13 admissible clock reference modification repairs, each 
involving two modifications. Nine repairs exchange y in the constraints y < 1 
and y > 1 by a selection from the set of clocks z, x and w. Four repairs exchange 
y in the constraint y < 1 by w or a, and w in the constraint w > 1 by y or z. 


Reset Clock Repair Analysis. This analysis aims to repair a property violation 
by adding or removing clock resets. We introduce a variation variable vj for 
each clock c; and the transition leaving location A; in the TDT. The reset status 
in the extended constraint system is inverted when He Æ 0: if c; was not reset 
before, it will now be reset, and vice versa. Applying the reset repair analysis to 
the examples in Figs. 1(a) and (b), TARTAR finds four admissible repairs. One 
repair removes the reset of clock y, another removes the reset of clock z and 
two repairs add a reset of clock x either on the transitions towards the state 
reqProcessing or the transition towards the state serReceiving. 


Urgent Location Repair Analysis. This analysis aims to repair cases where a 
faulty usage of urgent locations, which are always left with zero delay after 
entering, causes a property violation. Urgency of a location is modeled in the 
TDT constraint system by setting the location delay 6; to 0. We define a fresh 
variation variable v” for a location A;. For v” 4 0, the urgency for a location Aj 
is inverted. Applying the urgency location repair analysis to the examples in 
Figs. l(a) and (b), TARTAR finds two inadmissible repairs. The first one makes 
the state reqAwaiting urgent, and another repair makes the state serReceiving 
urgent. 


3 Usage of TarTar 


We have implemented all repair analyses described in [KLW19] and in this paper 
in a tool named TARTAR. It provides a graphical user interface, a command- 
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line interface and a web-interface which enables the execution of this resource 
intensive software on compute servers. A user selects one of these interfaces via 
arguments provided when invoking the Java library implementing TARTAR. For 
real-time model checking, TARTAR relies on Uppaal. 


— The argument -web launches the web server and corresponding interface. 

— Any other arguments launches the command-line mode. When using the argu- 
ment —help, the command-line console prints some help information. 

— When no arguments are given, the graphical user interface depicted in 
Fig. 2(a) is launched. The interface offers three tabs. New Analysis starts a 
repair analysis, New Experiment starts fault seeding which is described later 
in Sect. 5, and Version shows the current version number of TARTAR. 


All tool interfaces expect the same types of inputs in order to start a TARTAR 
analysis run. The user specifies a file containing the Uppaal model as input 
and selects the kind of repair to compute. Optionally, a file with a TDT of 
the given Uppaal model can be specified. When no TDT is provided, TARTAR 
automatically calls Uppaal to compute a TDT. The result of an analysis is 
one repaired model file for every computed repair, as well as a text file that 
summarizes which repairs are admissible. 
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Fig. 2. TARTAR tool 


4 Software Architecture and Implementation of TarTar 


The software architecture of TARTAR is depicted in Fig. 2(b). The orange rect- 
angles in the figure represent external tools that TARTAR calls in the course of 
the repair analysis. Uppaal is a state-of-the-art and closed-source model checking 
tool, which TARTAR uses to compute a TDT for a given model and property. 
The SMT solver Z3 [dMB08] is used to solve the generated partial MaxSMT 
problems. To check the admissibility of a repair, TARTAR uses opaal and the 
AutomataLib component of LearnLib [IHS15] since they conveniently provide 
functionality used during admissibility checking. 
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Data Flow Architecture. TARTAR consists of many computation steps. For exam- 
ple, a TDT is parsed internally and stored as a Trace. This Trace is then modi- 
fied and exported as SMT-LIB2 [BFT17] code. We define a computation step of 
TARTAR as the computation transforming input into result artifacts. This focus 
on artifacts ensures a highly cohesive architecture and clear interfaces between 
any two computation steps. Computation steps with identical objectives are 
grouped into a project. This results in four projects depicted by blue rectangles 
in Fig. 2(b). 


— HMI denotes the user interfaces of TARTAR. The user inputs a timed model. 
TARTAR then calls the project Repair Computation using a faulty timed 
model as a parameter. In case that the model is correct, TARTAR calls the 
project Fault Seeding. 

— Fault Seeding seeds faults into a correct model and then repairs the faulty 
model by computing repairs using Repair Computation. We use this analysis 
in Sect. 5 in order to benchmark the Repair Computation analyses. 

— Repair Computation computes candidate repairs for a faulty timed model, 
applies these repairs to the model and finally automatically calls the Admis- 
sibility Test. 

— Admissibility Test checks for every repaired model whether the computed 
repair is also admissible. 


Control Flow Architecture. TARTAR computes iteratively a set of repairs for a 
given faulty Uppaal model and a given property JT using the following steps: 


0. Counterexample Creation. TARTAR calls Uppaal to verify the model against 
IT. In case IT is violated, it stores a shortest symbolic TDT witnessing the 
violation in XML format. 

1. Diagnostic Trace Creation. TARTAR parses the model and the TDT into a 
data structure Trace. To add potential repairs, TARTAR copies the trace and 
replaces the constraints that will potentially be subject to a repair by their 
modified variants. The modified trace is then translated to a logic constraint 
system, represented in SMT-LIB2 code. 

2. Repair Computation. Z3 [AMB08] then solves a MaxSMT problem on the 
modified trace constraint system, computing a repair in which the number 
of unmodified constraints on the variation variables of type v = 0 is maxi- 
mized. Since Z3 can solve a MaxSMT problem only for quantifier-free linear 
real arithmetic, TARTAR first runs a quantifier elimination on the constraint 
system. It then solves the MaxSMT problem with soft constraints requir- 
ing v = 0 for all variation variables. For a more comprehensive presentation 
of this construction we refer the reader to [KLW20]. In case no solution is 
found, TARTAR terminates. Otherwise, TARTAR applies the repair to the 
faulty model and returns a repaired model. 

3. Admissibility Check. TARTAR checks the admissibility of a repair and com- 
pares the untimed languages of the faulty and repaired models. TARTAR 
calls the model checker opaal in order to compute the timed transition sys- 
tems (TTS) of the original and the repaired Uppaal model. We modified the 
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opaal model checker in such a way that it returns the TTS for a model. TAR- 
TAR then checks whether the two TTS have equivalent untimed languages, 
in which case the repair is admissible. This check is implemented using the 
library AutomataLib. In case the two TTS are not equivalent, the admissi- 
bility test returns a trace as a witness for the difference. 

4. Iteration. TARTAR enumerates all repairs, i.e., all combinations of constraint 
modifications that correct the TDT. The repairs are iteratively enumerated 
starting with the ones that require the smallest number of modifications to 
the model. After a repair is computed, the combination of modified variables 
that has been found is prevented from being reconsidered for future repairs by 
setting these modification variables to 0 using hard asserts. TARTAR then pro- 
ceeds with attempting to compute further, previously unconsidered repairs. 


Component Architecture. We imple- 
mented TARTAR with the general 


infrastructure depicted in Fig. 3. The we J 
interface Jo rovid a general runs input 
BE : Job pro ER 8 GUI ——y Session —> Job |$- 3 Description 
abstraction for an algorithm and spec- result 
ifies the necessary input and result Command-Line — 


values of the algorithm by the class 

Description. TARTAR contains a Job Fig. 3. TARTAR component architecture 
for the projects Fault Seeding, Repair 

Computations and Admissibility Test. 

The class Session executes a Job and derivations of Session provide the different 
interfaces to the user. With this infrastructure, the analysis implementation in 
TARTAR is independent from the implementation of the user interfaces, thus 
reducing coupling and improving modifiability of the code. 


Implementation Details. We implemented the different projects that constitute 
TARTAR in Java and use the build-management tool maven [Mav19] to manage 
the dependencies between the projects. TARTAR interacts differently with the 
external tools that are needed for different purposes. It calls Uppaal via the 
command-line interface in order to generate a TDT and calls Z3 via its API to 
compute a repair. For the admissibility check, it calls opaal using a command-line 
script and the AutomataLib as an included Java library. For the implementation 
of the TARTAR analyses the following two details are essential. 

We modify constraints in an Uppaal model in order to apply a repair or 
to seed a fault. Since neither clock constraints nor transitions possess explicit 
unique identifiers in an Uppaal model, it is not obvious which constraint to 
change. We therefore uniquely identify a constraint by traversing the constraints 
in the sequence stored in the Uppaal model file and use the constraint index in 
this sequence as its identifier. 

The complexity of the algorithms for solving quantifier elimination and the 
MaxSMT problem increase exponentially with the number of variables in the 
SMT model [KLW19]. We therefore reduce the number of variables by exploit- 
ing implied equality constraints. For example, a variable c; is created for every 
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clock c in every step j of the TDT. We eliminate c; explicitly before quantifier 
elimination by replacing it with the term J jer.. j di, where d; is the time delay 
at step 7 in the trace and r is the last step before 7 where c was reset. 


5 Evaluation 


Evaluation Strategy. In order to evaluate the repair analyses both qualitatively 
and quantitatively, we need to synthesize a set of faulty timed automata. To the 
best of our knowledge, no benchmark suite for faulty timed automata exists. We 
therefore create faulty models by using the fault seeding strategy from [KLW19] 
which is motivated by ideas from mutation testing [JH11]. Mutation testing eval- 
uates the quality of a test suite for a given program by systematically corrupting 
program code and determining the ratio of corruptions that the test suite is able 
to detect. We apply the same principle to evaluate the quality of our repair 
technique. As proposed in [KLW19], fault seeding modifies a single clock con- 
straint so that the result is a set of models that violate a given property. During 
the seeding, the bound of a single clock constraint is modified by an amount 
of {—10, —1,+1,+0.1M,+M}, where M is the maximal clock bound occurring 
in a given model. Our observation was that making either small modifications 
that are close to the bound value or modifications in the order of the maximal 
bound value M often introduce actual errors in the model. We have extended 
fault seeding to the new types of repairs. In particular, fault seeding addition- 
ally exchanges the comparison operator in a clock constraint by {<, <, =, >, >}, 
swap a referenced clock with all other clocks occurring in the given model, mod- 
ify the reset clocks of any transition, and switch for any location whether it is 
urgent. TARTAR checks automatically whether a modified TA violates a given 
property. If this is the case, it performs all of the above defined repair analyses. 


Results. We applied fault seeding to the models in [KLW19] and analyzed 
the obtained TDTs using the above described repair analyses implemented in 
TARTAR. All analyses were performed on a computer with an i7-6700K CPU 
(4.00 GHz), 60 GB of RAM and a 64 bit Linux operating system. We summarize 
the results of the experiment per considered model (Table1) and per type of 
considered repair (Table 2). Column Sd contains the count of seeded faults that 
result in a number #T of faulty models. Typ is the maximal time that Uppaal 
needs to create a TDT for the faulty models, and the longest TDT has a length of 
Ln. TARTAR computed for the TDTs overall a number #R repairs of which #A 
are admissible. An admissible repair is found for #5 of the TDTs. The computa- 
tion effort for a repair analysis is given by the time Tgog for successful quantifier 
elimination, the number of timeouts #0 of quantifier eliminations after 10 min, 
the average time Tr to compute a repair and the memory consumption Mp. The 
constraint system that Z3 solves has the count # Vr of variables and # Cn of con- 
straints. The effort for the admissibility check is given in time T'4¢m and memory 
Ma. All times are given in seconds and memory consumption in MB. Notice that 
we omit the columns pertaining to the fault seeding and TDT computation in 
Table 2 as they are irrelevant here. 
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Table 1. Experimental results according to model. 


Repair #Sd |#T|Typ |Ln |#R #A | #3 Tor #O|TR Mr |#Vr |#Cn |Taam Ma, 
db rep. 110 |13 |0.016| 4 229 |138 | 9 |89.346 | 2 |0.911|14.53 30 91 2.080 45 
csma 191 |10 |0.012| 2 70 | 26 | 8 (0.049 0 |0.023/0.58 16 72 1.825 75 
elevator 88 | 5 |0.011 1 7 5 | 4 |0.049 0 |0.020/0.53 6 28 1.665 17 
viking 310 | 9 |0.015| 18 9 7 | 5 |86.539 |21 |1.436/20.07| 120 180 1.952 | 543 
bando 1,955 |40 |0.111|279 4,061 209 |21 |31.555 |46 |4.922|20.86/1, 156 |8, 144 19.57 |1251 
Pacemaker|1,187 |12 |0.022| 9 62 | 19 |10 (0.663 |20 |0.325/2.59 116 988 1.994 | 206 
SBR 353 |50 |0.027) 84 751 |660 |31 |117.057|86 |2.686|37.16| 765 |1,211 | 138.004 | 211 
FDDI 314 |36 |0.014| 11 166 |105 |34 29.859 |51 |3.074|9.70 116 272 2.241 128 


Overall, TARTAR seeded 4.508 faults. This resulted in 175 TDTs in total 
(60 TDTs due to bound modification, 72 due to operator variation, 27 due to 
changing the clock reference, 8 due to complementing the reset of clocks and 
8 due to the switching of urgent locations). TARTAR found 5,355 repairs, out 
of which 1,169 were admissible. It found at least one admissible repair for 122 
of the TDTs. The maximal number of modified constraints in the admissible 
repairs computed for a single TDT using all types of analysis was 25. 


Table 2. Experimental results according to type of repair. 


Repair #R | #A|#S|Tow #0|Tr |Mr #Vr |#Cn Taam Ma 


Bound Modification) 533 364 85 15.209) 8 |4.922) 20.86/1,156 |2,498) 138.004) 525 
Operator Variation 3,929 | 96 |51 | 117.057) 44 |2.686) 37.16) 996 8,144) 59.117) 543 


Clock Reference 693 |625 |35 33.282| 61 |3.074| 14.13|1, 120 |5,355| 116.944| 206 
Reset Clock 45| 37 13 89.346|113 |0.911| 14.53) 996 |2,836 2.051 45 
Urgent Location 155 | 47 |37 0.107} 0 /0.135) 3.16/1,120 |2,502| 58.551 1,251 


Interpretation. Few of the seeded faults resulted in a property violation. TARTAR 
seeded 4.508 faults which led to 175 TDTs, thus only 3.9% of these faults result in 
a TDT. This supports the hypothesis that, in practice, often times only few time 
constraints have an impact on a property violation. TARTAR computes at least 
one admissible repair by bound modification for 85 (48%) of the 175 TDTs, by 
operator variation for 51 (29%), by clock reference for 35 (20%), by clock reset for 
13 (7%) and by urgent location for 37 (21%). Every analysis on its own computes 
less admissible repairs than the combination of all repair analyses, which solves 
122 (69%) of the 175 TDTs. The largest number of modified constraints in all 
the admissible repairs for a single TDT was 25, which is less than anticipated. 
This low number of modified constraints infer that, for the examples that we 
considered, only a few constraints of each TDT combined to admissible repairs. 
The number of modified constraints determines the number of possible repairs 
that have an impact on whether a property is violated or not. Since it was 
observed in [KLW19] that the computational effort for the repair computation is 
largely determined by the quantifier elimination step, we expect that in light of 
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the observed 226 timeouts a more efficient quantifier elimination would lead to 
a significantly higher number of repairs. Furthermore, the number of timeouts, 
and thus the computation time needed for the repair, rises with the length of 
the analyzed TDT. The model SBR has the most timeouts with 86 and the 
third longest trace with a length of 84 steps. The model bando has the third 
most timeouts with 46 and the longest trace. Obviously, the longer the TDT, 
the larger the resulting constraint system, leading to increased computational 
effort. The bando model has the largest constraint system with 1,156 variables 
and 8,144 constraints. The SBR model has the second largest constraint system 
with 765 variables and 1,211 constraints. The model FDDI has a shorter trace 
of length of 11 and a much smaller constraint system with 116 variables and 
272 constraints. From this we conclude that the complexity of a repair depends 
not only on the trace length, but also on the intrinsic complexity of the model. 
Modifying states from urgent to non-urgent during fault seeding resulted in 
only 8 TDTs. This low number is due to the observation that the considered 
models contain only few urgent states. Modifying non-urgent states to urgent 
ones, however, did not lead to a single property violation resulting in a TDT. 
The rationale is that urgency ensures to leave a state immediately without a 
delay which leads to a restriction rather than a relaxation regarding the time 
budget spent along an execution trace. As a consequence, making a state urgent 
does not cause a property violation in many models since the type of the checked 
properties is typically time bounded reachability, and a restricted time budget 
does not make it more likely that the property is violated. We finally observe 
that the admissibility check requires more computation resources than the repair 
computation. The maximal memory used for the admissibility test was 1,251 MB 
in contrast to 37.16MB for the repair computation. This is in line with our 
expectation since the admissibility test searches the state space of the full NTA, 
while the repair analyses only considers a single TDT. 


6 Conclusion 


We have presented the TARTAR tool, its architecture and implementation, and 
illustrated its application to a number of significant case studies. In the course 
of our work we have extended the repair analysis that is implemented in TAR- 
TAR for bound modification to modifications of comparison operators, clock 
references, reset of clocks and missing urgencies. The evaluation of the repair 
analyses showed that an admissible repair is computed for at least 69% of the 
analyzed TDTs. The integration of various tools with heterogeneous interfaces 
posed a particular challenge to the architecture of TARTAR which we addressed 
by the definition of intermediate artifacts. 

In future work we plan to explore the interplay between different repairs that 
are computed for a repaired system that still violates a property, and develop 
refined strategies to select promising repairs from a repair set. A further gener- 
alization of the analysis is to not only compute clock constraint modifications 
for faulty models but also to compute possible relaxations of clock constraints 
for correct models in order to support design space exploration. 
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Open Access This chapter is licensed under the terms of the Creative Commons 
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which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 
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material. If material is not included in the chapter’s Creative Commons license and 
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Abstract. We introduce SAW, a tool for safety analysis of weakly-hard 
systems, in which traditional hard timing constraints are relaxed to allow 
bounded deadline misses for improving design flexibility and runtime 
resiliency. Safety verification is a key issue for weakly-hard systems, as it 
ensures system safety under allowed deadline misses. Previous works are 
either for linear systems only, or limited to a certain type of nonlinear 
systems (e.g., systems that satisfy exponential stability and Lipschitz 
continuity of the system dynamics). In this work, we propose a new 
technique for infinite-time safety verification of general nonlinear weakly- 
hard systems. Our approach first discretizes the safe state set into grids 
and constructs a directed graph, where nodes represent the grids and 
edges represent the reachability relation. Based on graph theory and 
dynamic programming, our approach can effectively find the safe initial 
set (consisting of a set of grids), from which the system can be proven safe 
under given weakly-hard constraints. Experimental results demonstrate 
the effectiveness of our approach, when compared with the state-of-the- 
art. An open source implementation of our tool is available at https:// 
github.com/551100kk/SAW. The virtual machine where the tool is ready 
to run can be found at https://www.csie.ntu.edu.tw/~r08922054/SAW. 
ova. 


Keywords: Weakly-hard systems - Safety verification - Graph theory 


1 Introduction 


Hard timing constraints, where deadlines should always been met, have been 
widely used in real-time systems to ensure system safety. However, with the 
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Fig. 1. A weakly-hard system with perfect sensors and actuators. 


rapid increase of system functional and architectural complexity, hard deadlines 
have become increasingly pessimistic and often lead to infeasible designs or over 
provisioning of system resources [16,20,21,32]. The concept of weakly-hard sys- 
tems are thus proposed to relax hard timing constraints by allowing occasional 
deadline misses [2,11]. This is motivated by the fact that many system func- 
tions, such as some control tasks, have certain degrees of robustness and can 
in fact tolerate some deadline misses, as long as those misses are bounded and 
dependably controlled. In recent years, considerable efforts have been made in 
the research of weakly-hard systems, including schedulability analysis [1,2,5,12— 
14,19, 25,28,30], opportunistic control for energy saving [18], control stability 
analysis and optimization [8, 10,22, 23,26], and control-schedule co-design under 
possible deadline misses [3,6,27]. Compared with hard deadlines, weakly-hard 
constraints can more accurately capture the timing requirements of those system 
functions that tolerate deadline misses, and significantly improve system feasi- 
bility and flexibility [16,20]. Compared with soft deadlines, where any deadline 
miss is allowed, weakly-hard constraints could still provide deterministic guaran- 
tees on system safety, stability, performance, and other properties under formal 
analysis [17,29]. 

A common type of weakly-hard model is the (m, K) constraint, which spec- 
ifies that among any K consecutive task executions, at most m instances could 
violate their deadlines [2]. Specifically, the high-level structure of a (m, K)- 
constrained weakly-hard system is presented in Fig.1. Given a sampled-data 
system t = f(x,u) with a sampling period 6 > 0, the system samples the state 
x at the time t = id for n = 0,1,2,..., and computes the control input u with 
function (x). If the computation completes within the given deadline, the sys- 
tem applies u to influence the plant’s dynamics. Otherwise, the system stops 
the computation and applies zero control input. As aforementioned, the system 
should ensure the control input can be successfully computed and applied within 
the deadline for at least K—m times over any K consecutive sampling periods. 

For such weakly-hard systems, a natural and critical question is whether the 
system is safe by allowing deadline misses defined in a given (m, K) constraint. 
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There is only limited prior work in this area, while nominal systems have been 
adequately studied [4,9,15,31]. In [8], a weakly-hard system with linear dynamic 
is modeled as a hybrid automaton and then the reachability of the generated 
hybrid automaton is verified by the tool SpaceEx [9]. In [7], the behavior of a lin- 
ear weakly-hard system is transformed into a program, and program verification 
techniques such as abstract interpretation and SMT solvers can be applied. 

In our previous work [17], the safety of nonlinear weakly-hard systems are 
considered for the first time. Our approach tries to derive a safe initial set for any 
given (m, K) constraint, that is, starting from any initial state within such set, 
the system will always stay within the same safe state set under the given weakly- 
hard constraint. Specifically, we first convert the infinite-time safety problem into 
a finite one by finding a set satisfying both local safety and inductiveness. The 
computation of such valid set heavily lies on the estimation of the system state 
evolution, where two key assumptions are made: 1) The system is exponentially 
stable under nominal cases without any deadline misses, which makes the system 
state contract with a constant decay rate; 2) The system dynamics are Lipschitz 
continuous, which helps bound the expansion under a deadline miss. Based on 
these two assumptions, we can abstract the safety verification problem as a one- 
dimensional problem and use linear programming (LP) to solve it, which we call 
one-dimension abstraction in the rest of the paper. 

In practice, however, the assumptions in [17] are often hard to satisfy and 
the parameters of exponential stability are difficult to obtain. In addition, while 
the scalar abstraction provides high efficiency, the experiments demonstrate that 
the estimation is always over conservative. In this paper, we go one step further 
and present a new tool SAW for infinite-time safety verification of nonlinear 
weakly-hard systems without any particular assumption on exponential 
stability and Lipschitz bound, and try to be less conservative than the scalar 
abstraction. Formally, the problem solved by this tool is described as follows: 


Problem 1. Given an (m, K) weakly-hard system with nonlinear dynamics t = 
f(x,u), sampling period 4, and safe set X, find a safe initial set Xo, such that 
from any state x(0) € Xo, the system will always be inside X. 


To solve this problem, we first discretize the safe state set X into grids. We 
then try to find the grid set that satisfies both local safety and inductiveness. 
For each property, we build a directed graph, where each node corresponds to a 
grid and each directed edge represents the mapping between grids with respect 
to reachability. We will then be able to leverage graph theory to construct the 
initial safe set. Experimental results demonstrate that our tool is effective for 
general nonlinear systems. 


2 Algorithms and Tool Design 


The schematic diagram of our tool SAW is shown in Fig. 2. The input is a model 
file that specifies the system dynamics, sampling period, safe region and other 
parameters, and a configuration file of Flow* [4] (which is set by default but can 


546 C. Huang et al. 


File System 


Dynamic system _Flow* Result image 
model file configuration file cence A 


Result Region Plotter 
with Gnuplot 


Model Parser 
and Grid builder 


One-step Graph 
Constructor 


m, K) Gi l; 
Local Safety 5 and K-step Graph GK E Inductiveness Set 
a Constructor lk Calculator 


Fig. 2. The schematic diagram of SAW. 


Algorithm 1: Overall algorithm of SAW 

Data: Dynamic system f with safe state region X, the control law 7, 

weakly-hard constraint (m, K), sampling period 6 

Result: Safe initial state set Xo 
1 I = partition(X, p); 

/* Search the grid set that satisfies local safety. */ 
2 Gı = constructOneStepGraph() ; 
3 Is,Gx = calculateLocalSafety() ; 

/* Search the grid set that satisfies inductiveness. */ 
4 Ir = calculateInductivenessSet() ; 
5 return Iņ; 


also be customized). After fed with the input, the tool works as follows (shown 
in Algorithm 1). The safe state set X is first uniformly partitioned into small 
grids I’ = {v1,V2,...,Upa}, where X = vy U v2 U+- U vg, vi N vj = 6 (Vi # J), 
d is the dimension of the state space, and p is the number of partitions in each 
dimension (Line 1 in Algorithm 1). The tool then tries to find the grids that 
satisfy the local safety. It first invokes a reachability graph constructor to build 
a one-step reachability graph G, to describe how the system evolves in one 
sampling step (Line 2). Then, a dynamic programming (DP) based approach 
finds the largest set Is = {vs,,Us,,---,Us,,} from which the system will not go 
out of the safe region. The K-step reachability graph Gx is also built in the DP 
process based on G (Line 3). After that, the tool searches the largest subset 
Ir of Is that satisfies the inductiveness by using a reverse search algorithm 
(Line 4). The algorithm outputs I; as the target set Xo (Line 5). 

The key functions of the tool are the reachability graph constructor, DP- 
based local safety set search, and reverse inductiveness set search. In the following 
sections, we introduce these three functions in detail. 


2.1 Reachability Graph Construction 


Integration in dynamic system equations is often the most time-consuming part 
to trace the variation of the states. In this function, we use Flow* to get a valid 
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Algorithm 2: Construct one-step graph: construct OneStepGraph() 


Data: Dynamic system f, grid set I’, the control law 7, sampling period 6 
Result: Directed graph G(T, E1) 


/* Initialize the edge set FE; of Gi. */ 
1 By — 0; 
2 for v € T do 
/* Consider deadline miss (e= 1)/meet (e= 0) respectively. */ 
3 for e € {0,1} do 
/* Compute one step reachable set Rı(v) from v. */ 
4 Ri(v) = Flow*(v, ô, e); 
/* v is unsafe and no edge is added if X° N Ri(v) #9. */ 
5 if X° N Ri(v) # Ý then Conitnue; 
/* Add an edge pointing v’ from v if v'N Ri(v) #9. */ 
6 for v’ € I do 
| if v’ N Ri(v) #0 then Eı — Fi U{(v,e,v’)}; 


8 return Gi(I, E1); 


overapproximation of reachable set (represented as flowpipes) starting from every 
grid after a sampling period 6. Given a positive integer n, the graph constructed 
by the reachability set after n sampling period, n - 6, is called a n-step graph 
Gn. Since the reachability for all the grids in any sampling step is independent 
under our grid assumption, we first build G; and then reuse G4 to construct Gx 
later without redundant computation of reachable set. 

One-step graph is built with Algorithm 2. We consider deadline miss and 
deadline meet separately, corresponding to two categories of edges (Line 3). For 
a grid v, if the one-step reachable set R; (v) intersects with unsafe state X°, then 
it is considered as an unsafe grid and we let its reachable grid be Ø. Otherwise, 
if Rı(v) intersects with another grid v’ under the deadline miss/meet event e, 
then we add a directed edge (v,e,v’) from v’ to v with label e. The number 
of outgoing edges for each grid node v is bounded by p?. Assuming that the 
complexity of Flow* to compute flowpipes for its internal clock «€ is O(1), we can 
get the overall time complexity as O(|I"|- pt - 6/e). 

k-step graph Gx is built for finding the grid set that satisfies local safety and 
inductiveness. To avoid redundant computation on reachable set, we construct 
Gx based on G by traversing K-length paths, as the bi-product of local safety 
set searching procedure. 


2.2 DP-Based Local Safety Set Search 


We propose a bottom-up dynamic programming for considering all the possible 
paths, utilizing the overlapping subproblems property (Algorithm 3). The reach- 
able grid set at step K that is derived from a grid v at step k < K with respect 
to the number of deadline misses n < m can be defined as DP(v,n,k). To be 
consistent with Algorithm 2, this set is empty if and only if it does not satisfy the 
local safety. We need to derive DP(v, 0,0). Initially, the zero-step reachability is 


548 C. Huang et al. 


Algorithm 3: Search grid set for local safety: calculateLocalSafety () 
Data: Directed graph Gi (I, £1), weakly-hard constraint (m, K) 
Result: Grid set I's, directed graph Gx (I, Ex) 
1 for v € I do 
2 for n — 0 to m do 
3 DP(v, n, K) — {v}; 
4 for k — K — 1 to 0 do 
5 for v € I do 
6 for n — 0 to m do 
7 
8 
9 


isSafe — True; 
for e € {0,1} do 
if n+e< m then 

10 nextGrids(v) — {v’ | (v,e, v’) € Ex}; 
11 if nexrtGrids(v) = 0 then isSafe — False; break; 
12 for v’ € nextGrids(v) do 
13 R(v') — DP(v,n +e, k + 1); 
14 if R(v) = 0 then isSafe — False; break; 
15 DP(v',n, k) — DP (v',n, k) U R(v); 
16 if isSafe = false then 
17 DP(v,n, k) — 0; 


18 I's — {v | DP(v,0,0) # Ø}; 
19 Ex <— {(v,v’) | v € DP(v,0,0)}; 
20 return I's,Gx (I, Ex); 


straight forward, i.e., Vu € T,n € [0,m], DP(v,n, K) = {v}. The transition is 
defined as: 


Vk € [0, K — 1]: DP(v,n,k) = U DP(v’',n+e,k+1). 


Vv',e:(v,e,v’)€ Ei, nte<m 


If there exists an empty set on the right hand side or there is no outgoing edge 
from v for any e such that n +e < m, we let DP(v,n,k) = @. Finally, we have 
Is = {v | DP(v,0,0) 4 0}, Ex = {(v,v’) | v’ € DP(v,0,0)}. 

We used bitset to implement the set union which can accelerate 64 times 
under the 64-bit architecture. The time complexity is O(|I |? /bits-p?-K?+|I|?), 
where bits depends on the running environment. |I|? is contributed by Gx. 


2.3 Reverse Inductiveness Set Search 


To find the grid set I; C Is that satisfies inductiveness, we propose a reverse 
search algorithm Algorithm 4. Basically, instead of directly searching lr, we 
try to obtain Iy by removing any grid v within Is, from which there exists 
a path reaching Ty = I — Ig. Specifically, Algorithm 4 starts with initializing 
Ty =T —Ts (line 1). The Iy iteratively absorbs the grid v that can reach Ty in 
K sampling periods, until a fixed point is reached (line 2-3). Finally Ty = [—Iy 
is the largest set that satisfies inductiveness. It is implemented as a breadth first 
search (BFS) on the reversed graph of Gx, and the time complexity is O(|I"|?). 
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Algorithm 4: Search grid set for inductiveness: calculateInductiveness- 
Set() 

Data: Directed graph Gx (I, Ex), Grid set I's 

Result: Grid set Ir 

Iy — I- Is; 

while 3(v, v’) € Ex such that v ¢ Iy,v' € Ty do 
| Iy — Iy U {v}; 

Ir =T —Ty; 

return [7; 


ap Ne 


3 Example Usage 


Example 1. Consider the following linear control system from [17]: 


£i — 0 1 Ti + hërë ne 0 0 Tı 
ča) — [00.1] asl T% VOSS U= | 0.375 —1.15| |z] 


ô = 0.2 and step_size = 0.01. The initial state set is xı € [—1,1] and z2 € [—1, 1]. 
The safe state set is xı € [—3,3] and z2 € [—3,3]. Following the input format 
shown in Listing 1.1. Thus, we prepare the model file as Listing 1.2. 

1 <state_dim> <input_dim> <grid_count> 


2 <state_var_names> <input_var_names > 
3 <state_ode.1> 


i 1 2 1 50 
Pema . > Si p2 u 
5 <state_ode.state_dim> 


3 X2 
4 SOEN E E N 
5 SOLITE £2 Sell = ahaa E ser 


6 <input_equa.1i> 


8 <input_equa.input_dim> 


a 5 6 0.2 0.01 
9 <period> <step_size> 725 
10 <m> <k> , -3 3 
n <safe_state.1> > M 
a ae j 10 Der 
13 <safe_state.state_dim> 1 Se 


14 <initial_state.1> 


Listing 1.2. example/modell.txt 


16 <initial_state.state_dim> 


Listing 1.1. Input format 


Then, we run our program with the model file. 


1 ./saw example/modeli.txt 


To further ease the use of our tool, we also pre-complied our tool for x86_64 linux 
environment. In such environment, users do not need to compile our tool and 
can directly invoke saw_linux_x86_64 instead of saw (which is only available 
after manually compiling the tool). 


1 ./saw_linux_x86_64 example/model1.txt 


The program output is shown in Listing 1.3. Line 6 shows the number of 
edges of G1. Lines 8-10 provide the information of Gx, including the number of 
edges and nodes. Line 12 prints the safe initial set Xo. Our tool then determines 
whether the given initial set is safe by checking if it is the subset of Xo. 
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1 [Info] Parsing model. 
2 [Info] Building FLOW* configuration. 
3 [Info] Building grids. 
1 [Info] Building one-step graph. 
Process: 100.00% 
6 [Success] Number of edges: 19354 
[Info] Building K-step graph. 
s [Success] Start Region Size: 1908 
9 End Region: 1208 
10 Number of Edges: 102436 
11 [Info] Finding the largest closed subgraph. 
12 [Success] Safe Initial Region Size: 1622 
13 [Info] Calculating area. 
14 Initial state region: 4.000000 
15 Grids Intersection: 4.000000 
16 Result: safe 


Listing 1.3. Verification result 


Table 1. Benchmark setting. ODE denotes the ordinary differential equation of the 
example, m denotes the control law, and ô is the discrete control stepsize. 


# | ODE T ô | Safe state set |(m, K) 
= —3.0, 3.0 
|) E u = —0.375z1 — 1.15z2 || = ; (2, 5) 
vo = —O.lao+u T2E —3.0, 3.0 
tı = —2 Z= —6.0, 6.0 
2 a 1+ U1 ur Tı 0.3 zı € ' (1, 10) 
t2 = —0.9x%2 + U2 u2 = —%1 — T2 x2 € |—6.0, 6.0 
bee —3.0, 3.0 
| Se eee? u=a iens 72" | (2, 10) 
t2 = —2gı — 0.lzr2 + u x2 € |—3.0, 3.0 
4 |t=zr? -r +u u = —2x 0.6 |x € [-4.0,4.0] |(2, 100) 
5 | è= 0.2x + 0.032? + u u=—0.32 1.6 | x € [—2.0,2.0] | (1, 5) 
4 #5 T2 — vita} u = 1.2221 Pi 0.1 | 7 € [-5.0,5.0 (2, 15) 
t2 =u —0.12925 x2 € |—5.0, 5.0 


4 Experiments 


We implemented a prototype of SAW that is integrated with Flow*. In this 
section, we first compare our tool with the one-dimension abstraction [17], on the 
full benchmarks from [17] (#1-#4) and also additional examples with no guaran- 
tee on exponential stability from related works (#5 and #6) [24]. Table 1 shows 
the benchmark settings, including the (m, K) constraint set for each benchmark. 
Then, we show how different parameter settings affect the verification results of 
our tool. All our experiments were run on a desktop, with 6-core 3.60 GHz Intel 
Core i7. 


4.1 Comparison with One-Dimension Abstraction 


Table 2 shows the experimental results. It is worth noting that the one-dimension 
abstraction cannot find the safe initial set in most cases from [17]. In fact, it only 
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Table 2. Experimental results. ExpParam denotes the parameters of the exponential 
stability, where “N/A” means that either the system is not exponentially stable or 
the parameters are not available. Initial state set denotes the set that needs to be 
verified. The last two columns denote the verification results of the one-dimension 
abstraction [17] and SAW, respectively. “—” means that no safe initial set Xo is found 
by the tool. p represents the partition number for each dimension in SAW. Time (in 
seconds) represents the execution time of SAW. 


# | ExpParam | Initial state set | One-dimension abstraction | SAW 
Result p | Result | Time 
= 1, —1.0, 1. 
fe eee a 50/Yes | 72.913 
A= 0.4 a2 € [-1.0,1.0 
=1.1 —6.0, 6.0 N 
aioe e ja ma A 30| Yes | 10.360 
A=1.8 x2 € |—6.0, 6.0 (Xo : £i + x% < 1.9477) 
=2 —1.0, 2.0 
aj C= [2E ; -_ 100 | Yes | 183.30 
A= 0.37 | x2 € [-1.0,1.0 
a= 1.4, 
4 = x € [—4.0, 4.0] = 30 | Yes 80.613 
5 N/A x € [—1.56, 1.32] | — 100 | Yes 4.713 
€ [—5.0, 5.0 
6 N/A mel lL 50| Yes | 750.77 
x2 € [—5.0, 5.0] 


works effectively for a limited set of (m, K), e.g., when no consecutive deadline 
misses is allowed. For general (m, K) constraints, one-dimension abstraction per- 
forms much worse due to the over-conservation. Furthermore, we can see that, 
without exponential stability, one-dimension abstraction based approach is not 
applicable for the benchmarks #5 and #6. Note that for benchmark #2, one- 
dimension abstraction obtains a non-empty safe initial set Xo, which however, 
does not contain the given initial state set. Thus we use “No” instead of “—” to 
represent this result. Conversely, for every example, our tool computes a feasible 
Xo that contains the initial state set (showing the initial state set is safe), which 
we denote as “Yes”. 


4.2 Impact of (m, K), Granularity, and Stepsize 


(m, K). We take benchmark #1 (Example 1 in Sect. 3) as an example and run 
our tool under different (m, K) values. Figures 3a, 3b, 3c demonstrate that, for 
this example, the size of local safety region I's shrinks when K gets larger. The 
size of inductiveness region Iy grows in contrast. [g becomes the same as Iy 
when K gets larger, in which case m is the primary parameter that influences 
the size of I7. 


Granularity. We take benchmark #3 as an example, and run our tool with 
different partition granularities. The results (Figs. 3d, 3e, 3f) show that Iy grows 
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(£) p = 100 


Fig. 3. Results under different (m, K) values (3a, 3b, 3c) and different granularities 
(3d, 3e, 3f). The green solid region is Iy. The slashed region is I's. The blue rectangle 
is the initial state set that needs to be verified. (Color figure online) 


when p gets larger. The choice of p has significant impact on the result (e.g., the 
user-defined initial state set cannot be verified when p = 15). 


Stepsize. We take benchmark #5 as an example, and run our tool with dif- 
ferent stepsizes of Flow*. With the same granularity p = 100, we get the safe 
initial state set Ty = [—1.56, 1.32] when step_size = 0.1, but I; is empty when 
step_size = 0.3. The computation times are 4.713s and 1.8358, respectively. 
Thus, we can see that there is a trade-off between the computational efficiency 
and the accuracy. 


5 Conclusion 


In this paper, we present a new tool SAW to compute a tight estimation of safe 
initial set for infinite-time safety verification of general nonlinear weakly-hard 
systems. The tool first discretizes the safe state set into grids. By constructing 
a reachability graph for the grids based on existing tools, the tool leverages 
graph theory and dynamic programming technique to compute the safe initial 
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set. We demonstrate that our tool can significantly outperform the state-of-the- 
art one-dimension abstraction approach, and analyze how different constraints 
and parameters may affect the results of our tool. Future work includes further 
speedup of the reachability graph construction via parallel computing. 
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Abstract. Reachability analysis is a critical tool for the formal verifica- 
tion of dynamical systems and the synthesis of controllers for them. Due 
to their computational complexity, many reachability analysis methods 
are restricted to systems with relatively small dimensions. One significant 
reason for such limitation is that those approaches, and their implementa- 
tions, are not designed to leverage parallelism. They use algorithms that 
are designed to run serially within one compute unit and they can not uti- 
lize widely-available high-performance computing (HPC) platforms such 
as many-core CPUs, GPUs and Cloud-computing services. 

This paper presents PIRK, a tool to efficiently compute reachable sets 
for general nonlinear systems of extremely high dimensions. PIRK can 
utilize HPC platforms for computing reachable sets for general high- 
dimensional non-linear systems. PIRK has been tested on several systems, 
with state dimensions up to 4 billion. The scalability of PIRK’s parallel 
implementations is found to be highly favorable. 


Keywords: Reachability analysis - ODE integration - Runge-Kutta 
method - Mixed monotonicity - Monte Carlo simulation - Parallel 
algorithms 


1 Introduction 


Applications of safety-critical cyber-physical systems (CPS) are growing due 
to emerging IoT technologies and the increasing availability of efficient com- 
puting devices. These include smart buildings, traffic networks, autonomous 
vehicles, truck platooning, and drone swarms, which require reliable bug-free 
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software that perform in real-time and fulfill design requirements. Traditional 
simulation/testing-based strategies may only find a small percentage of the soft- 
ware defects and the repairs become much costly as the system complexity grows. 
Hence, in-development verification strategies are favorable since they reveal the 
faults in earlier stages, and guarantee that the designs satisfy the specifications 
as they evolve through the development cycle. Formal methods offer an attrac- 
tive alternative to testing- and simulation-based approaches, as they can verify 
whether the specifications for a CPS are satisfied for all possible behaviors from 
a set of the initial states of the system. Reachable sets characterize the states 
a system can reach in a given time range, starting from a certain initial set 
and subjected to certain inputs. They play an important role in several formal 
methods-based approaches to the verification and controller synthesis. An exam- 
ple of this is abstraction-based synthesis [1—4], in which reachable sets are used 
to construct a finite-state “abstraction” which is then used for formal synthesis. 

Computing an exact reachable set is generally not possible. Most practical 
methods resort to computing over-approximations or under-approximations of 
the reachable set, depending on the desired guarantee. Computing these approx- 
imations to a high degree of accuracy is still a computationally intensive task, 
particularly for high-dimensional systems. Many software tools have been cre- 
ated to address the various challenges of approximating reachable sets. Each of 
these tools uses different methods and leverages different system assumptions to 
achieve different goals related to computing reachable sets. For example, CORA 
[5] and SpaceEx [6] tools are designed to compute reachable sets of high accu- 
racy for very general classes of nonlinear systems, including hybrid ones. Some 
reachability analysis methods rely on specific features of dynamical systems, 
such as linearity of the dynamics or sparsity in the interconnection structure 
[7-9]. This allows computing the reachable sets in shorter time or for relatively 
high-dimensional systems. However, it limits the approach to smaller classes of 
applications, less practical specifications, or requires the use of less accurate (e.g., 
linearized) models. 

Other methods attack the computational complexity problem by comput- 
ing reachable set approximations from a limited class of set representations. An 
example of limiting the set of allowed overapproximations are interval reachabil- 
ity methods, in which reachable sets are approximated by Cartesian products of 
intervals. Interval reachability methods allow for computing the reachable sets of 
very general non-linear and high-dimensional systems in a short amount of time. 
They also pose mild constraints on the systems under consideration, usually only 
requiring some kind of boundedness constraint instead of a specific form for the 
system dynamics. Many reachability tools that are designed to scale well with 
state dimension focus on interval reachability methods: these include Flow* [10], 
CAPD [11], C2E2 [12], VNODE-LP [13], DynIbex [14], and TIRA [15]. 

Another avenue by which reachable set computation time can be reduced, 
which we believe has not been sufficiently explored, is the use of parallel com- 
puting. Although most reachability methods are presented as serial algorithms, 
many of them have some inherent parallelism that can be exploited. One example 
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of a tool that exploits parallelism is XSpeed [16], which implements a parallelized 
version of a support function-based reachability method. However, this parallel 
method is limited to linear systems, and in some cases only linear systems with 
invertible dynamics. Further, the parallelization is not suitable for massively 
parallel hardware: only some of the work (sampling of the support functions) 
is offoaded to the parallel device, so only a relatively small number of parallel 
processing elements may be employed. 

In this paper, we investigate the parallelism for three interval reachability 
analysis methods and introduce PIRK, the Parallel Interval Reachability Ker- 
nel. PIRK uses simulation-based reachability methods [17-19], which compute 
rigorous approximations to reachable sets by integrating one or more systems 
of ODEs. PIRK is developed in C++ and OpenCL as an open-source! kernel for 
pFaces [20], a recently introduced acceleration ecosystem. This allows PIRK to 
be run on a wide range of computing platforms, including CPUs clusters, GPUs, 
and hardware accelerators from any vendor, as well as cloud-based services like 
AWS. 

The user looking to use a reachability analysis tool for formal verification 
may choose from an abundance of options, as our brief review has shown. What 
PIRK offers in this choice is a tool that allows for massively parallel reachability 
analysis of high-dimensional systems with an application programming interface 
(API) to easily interface with other tools. To the best of our knowledge, PIRK is 
the first and the only tool that can compute reachable sets of general non-linear 
systems with dimensions beyond the billion. As we show later in Sect.5, PIRK 
computes the reachable set for a traffic network example with 4 billion dimension 
in only 44.7 min using a 96-core CPU in Amazon AWS Cloud. 


2 Interval Reachability Analysis 


Consider a nonlinear system with dynam- oe 
ics & = f(t,x,p) with state x € R”, a set | ee keche 
of initial states Xo, a time interval [to, t1], 
and a set of time-varying inputs P defined 
over [to, ti]. Let D(t; to, xo, p) denote the o 
state of the system, at time t, of the tra- 2 
jectory beginning at time tọ at initial state z 
xo under input p. We assume the systems 2} a, 
are continuous-time. 

The finite-time forward reachable set 
is defined as 


Fig. 1. An example of an Interval 


Reachability problem for a nonlinear 
Riot, = {P(t1; to, £, E Xop E P}. Y P 
totı = {P(t to, 2, p)|e nE } system. Red rectangle: initial set. Blue 


For the problem of interval reachabil- rectangles: reachable sets for several 
ity analysis, there are a few more con- final times ti. (Color figure online) 
straints on the problem structure. An interval set is a set of the form |a,@] = 


' PIRK is publicly available at https://github.com/mkhaled87/pFaces-PIRK. 
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{a:a <a < a}, where < denotes the usual partial order on real vectors, that 
is the partial order with respect to the positive orthant cone. The vectors a and 
@ are the lower and upper bounds respectively of the interval set. An interval 
set can alternatively be described by its center a* = $(@+ a) and half-width 
[a] = $(@—a). In interval reachability analysis, the initial set must be an interval, 
and inputs values restricted to an interval set, i.e. p(t) € [p,p], and the reach- 
able set approximation must also be an interval (Fig. 1). Furthermore, certain 
methods for computing interval reachable sets require further restrictions on the 
system dynamics, such as the state and input Jacobian matrices being bounded 
or sign-stable. 


2.1 Methods to Compute Interval Reachable Sets 


PIRK computes interval reachable sets using three different methods, allowing 
for different levels of tightness and speed, and which allow for different amounts 
of additional problem data to be used. 

The Contraction/Growth Bound method [4,21,22] computes the reachable 
set using component-wise contraction properties of the system. This method may 
be applied to input-affine systems of the form t = f(t,x) + p. The growth and 
contraction properties of each component of the system are first characterized 
by a contraction matrix C. The contraction matrix is a component-wise gener- 
alization of the matrix measure of the Jacobian J; = Of /Ox [19,23], satisfying 
Cii > Jz iilt, £) for diagonal Jacobian elements Js i(t, £), and Ci; > |Jz,i;(t, x)| 
for off-diagonal Jacobian elements Jsi; (t, x). The method constructs a reachable 
set over-approximation by separately establishing its center and half-width. The 
center is found by simulating the trajectory of the center of the initial set, that 
is as (t1; to, x*, p*). The half width is found by integrating the growth dynamics 
t = g(r,p) = Cr + |p], where [p] = 4(p — p), over [to, tı] with initial condition 
r(to) = [2] = 4@— 2). 

The Mixed-Monotonicity method [24] computes the reachable set by separat- 
ing the increasing and decreasing portions of the system dynamics in an auxiliary 
system called the embedding system whose state dimension is twice that of the 
original system [25]. The embedding system is constructed using a decomposi- 
tion function d(t,x,p,ĉ, p), which encodes the increasing and decreasing parts 
of the system dynamics and satisfies d(t, x, p,x,p) = f(t,2,p). The evaluation 
of a single trajectory of the embedding system can be used to find a reachable 
set over-approximation for the original system. 

The Monte Carlo method computes a probabilistic approximation to the 
reachable set by evaluating the trajectories of a finite number m of pairs sam- 
ple points (2, p“) in the initial set and input set, and selecting the smallest 
interval that contains the final points of the trajectories. Unlike the other two 
methods, the Monte Carlo method is restricted to constant-valued inputs, i.e. 
inputs of the form p(t) = p, where p € [p,p]. Each sampled initial state x? is 
integrated over [to, t1] with its input p to yield a final state ol, The interval 
reachable set is then approximated by the elementwise minimum and maximum 
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of the oe, This approximation satisfies a probabilistic guarantee of correctness, 
provided that enough sample states are chosen [26]. Let [R, R] be the approxi- 
mated reachable set, €,6 € (0,1), and m > (2) log (%) . Then, with probability 
1—6, the approximation |R, R] satisfies P(R,, +, \[R, R]) < €, where P(A) denotes 
the probability that a sampled initial state will yield a final state in the set A, 
and \ denotes set difference. The probability that a sampled initial state will be 
sent to a state outside the estimate (the “accuracy” of the estimate) is quanti- 
fied by e. Improved accuracy (lower €) increases the sample size as O(1/e). The 
probability that running the algorithm will fail to give an estimate satisfying the 
inequality (The “confidence” ) is quantified by 6. Improved confidence (lower 6) 
increases the sample size by O(log(1/0)). 


3 Parallelization 


The bulk of the computational work in each method is spent in ODE integration. 
Hence, the most effective approach by which to parallelize the three methods is to 
design a parallel ODE integration method. There are several available methods 
for parallelizing the task of ODE integration. Several popular methods for paral- 
lel ODE integration are parallel extensions of Runge-Kutta integration methods, 
which are the most popular serial methods for ODE integration. 

PIRK takes advantage of the task-level parallelism in the Runge-Kutta equa- 
tions by evaluating each state dimension in parallel. This parallelization scheme 
is called parallelization across space [27]. PIRK specifically uses a space-parallel 
version of the fourth-order Runge-Kutta method, or space-parallel RK4 for 
brevity. In space-parallel RK4, each parallel thread is assigned a different state 
variable to evaluate the intermediate update equations. After each intermediate 
step, the threads must synchronize to construct the updated state in global mem- 
ory. Space-parallel RK4 can use as many parallel computation elements as there 
are state variables: since PIRK’s goal is to compute reachable sets for extremely 
high-dimensional systems, this is sufficient in most cases. 

The space-parallel scheme is not hardware-specific, and may be used with any 
parallel computing platform. PIRK is similarly hardware-agnostic: the pFaces 
ecosystem, for which PIRK is a kernel, provides a common interface to run on 
a variety of heterogeneous parallel computing platforms. The only difference 
between platforms that affects PIRK is the number of available parallel processing 
elements (PEs). 


4 Complexity of the Parallelized Methods 


The parallelized implementations of the three reachability methods described 
in Sect. 2.1 use space-parallel RK4 to perform almost all computations other 
than setting up initial conditions. We can therefore find the time and memory 
complexity of each method by analyzing the complexity of space-parallel RK4 
and counting the number of times each method uses it. 
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For a system with n dimensions, space-parallel RK4 scales linearly as the 
number of PEs (denoted by P) increases. In a computer with a single PE (i.e., 
P = 1), the algorithm reduces to the original serial algorithm. Then, suppose 
that a parallel computer has P < n PEs of the same type. We assume a com- 
putational model under which instruction overhead and latency from thread 
synchronization are negligible, memory space has equal access time from all 
processing elements, and the number of parallel jobs can be evenly distributed 
among the P processing elements.? Under this parallel random-access machine 
model [28], the time complexity of space-parallel RK4 is reduced by a factor of 
P: each PE is responsible for computing n/P components of the state vector. 
Therefore, for fixed initial and final times tọ and t1, the time complexity of the 
algorithm is O(). 

The parallel version of the contraction/growth bound method uses space- 
parallel RK4 twice. First, it is used to compute the solution of the system’s 
ODE f for the center of the initial set Xo. Then, it is used to compute the 
growth/contraction of the initial set Yo by solving the ODE g of the growth 
dynamics. Since this method uses a fixed number of calls of space-parallel RK4, 
its time complexity is also O(}) for a given to and tı. 

The parallelized implementation of the mixed-monotonicity method uses 
space-parallel RK4 only once, in order to integrate the 2n-dimensional embed- 
ding system. This means that the mixed-monotonicity method also has a time 
complexity of O(+) for fixed to and tı. However, the mixed-monotonicity method 
requires twice as much memory as the growth bound method, since it runs space- 
parallel RK4 on a system of dimension 2n. 

The parallelized implementation of the Monte Carlo method uses space- 
parallel RK4 m times, once for each of the m sampled initial states. The imple- 
mentation uses two levels of parallelization. The first level is a set of parallel 
threads over the samples used for simulations. Then, within each thread, another 
parallel set of threads are launched by space-parallel RK4. This is realized as 
one parallel job of m x n threads. Consequently, the Monte Carlo method has 
a complexity of O(@*). Since only the elementwise minima and maxima of the 
sampled states need to be stored, this method only requires as much memory as 
the growth bound method. 


Remark 1. A pseudocode of each parallel algorithm and a detailed discussion of 
their time and space complexities are provided in an extended version of this 
paper [29]. The extended version also contains additional details for the case 
studies that will be presented in the next section. 


5 Case Studies 


In each of the case studies to follow, we report the time it takes PIRK to compute 
reachable sets for systems of varying dimension using all three of its methods on 


2 While these non-idealities will be present in real systems and slow down computation, 
they should not affect the asymptotic complexity. 
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Fig. 2. Logarithmic plots of the results for speed tests of the traffic model (first row) 
and the quadrotor swarm (second row). Speed test results for the serial interval reach- 
ability toolbox TIRA are also shown for the traffic model. 


a variety of parallel computing platforms. We perform some of the same tests 
using the serial tool TIRA, to measure the speedup gained by PIRK’s ability to 
use massively parallel hardware. 

We set a time limit of 1h for all of the targeted case studies, and report 
the maximum dimensions that could be reached under this limit. The Monte 
Carlo method is given probabilistic parameters € = ô = 0.05 in each case study 
where it is used. We use four AWS machines for the computations with PIRK: 
m4.10xlarge which has a CPU with 40 cores, c5.24xlarge which has a CPU 
with 96 cores, g3.4xlarge which has a GPU with 2048 cores, and p3.2xlarge 
which has a GPU with 5120 cores. For the computations with TIRA, we used a 
machine with a 3.6 GHz Intel i7 CPU. 


5.1 n-link Road Traffic Model 


We consider the road traffic analysis problem reported in [30], a proposed bench- 
mark for formal controller synthesis. We are interested in the density of cars along 
a single one-way lane. The lane is divided into n segments, and the density of cars 
in each segment is a state variable. The continuous-time dynamics are derived 
from a spatially discretized version of the Cell Transmission Model [31]. This is 
a nonlinear system with sparse coupling between state variables. 

The results of the speed test are shown in the first row of Figure 2. The 
machines m4.10xlarge and c5.24xlarge reach up to 2 billion and 4 billion 
dimensions, respectively, using the growth/contraction method, in 47.3 min and 
44.7min, respectively. Due to memory limitations of the GPUs, the machines 
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g3.4xlarge and p3.2xlarge both reach up to 400 million in 106s and 11s, 
respectively. 

The relative improvement of PIRK’s computation time over TIRA’s is sig- 
nificantly larger for the growth bound method than for the other two. This 
difference stems from how each tool computes the half-width of the reachable 
set from the radius dynamics. TIRA solves the radius dynamics by computing 
the full matrix exponential using MATLAB’s expm, whereas PIRK directly inte- 
grates the dynamics using parallel Runge-Kutta. This caveat applies to Sect. 5.2 
as well. 


5.2 Quadrotor Swarm 


The second test system is a swarm of K identical quadrotors with nonlinear 
dynamics. The system dynamics of each quadrotor model are derived in a sim- 
ilar way to the model used in the ARCH-COMP 18 competition [32], with the 
added simplification of a small angle approximation in the angular dynamics 
and the neglect of Coriolis force terms. A derivation of both models is avail- 
able in [33]. Similar to the n-link traffic model, this system is convenient for 
scaling: system consisting of one quadrotor can be expressed with 12 states, so 
the state dimension of the swarm system is n = 12K. While this reachability 
problem could be decomposed into K separate reachability problems which can 
be solved separately, we solve the entire 12K-dimensional problem as a whole to 
demonstrate PIRK’s ability to make use of sparse interconnection. 

The results of the speed test are shown in Fig. 2 (second row). The machines 
m4.10xlarge and c5.24xlarge reach up to 1.8 billion dimensions and 3.6 bil- 
lion dimensions, respectively, (using the growth/contraction method) in 48 min 
and 32 min, respectively. The machines g3.4xlarge and p3.2xlarge both reach 
up to 120 million dimensions in 10.6 min and 46s, respectively. 


5.3 Quadrotor Swarm with Artificial Potential Field 


The third test system is a modification of the quadrotor swarm system which 
adds interactions between the quadrotors. In addition to the quadrotor dynamics 
described in Sect.5.2, this model augments each quadrotor with an artificial 
potential field to guide it to the origin while avoiding collisions. This controller 
applies nonlinear force terms to the quadrotor dynamics that seek to minimize 
an artificial potential U that depends on the position of all of the quadrotors. 
Due to the interaction of the state variables in the force terms arising from the 
potential field, this system has a dense Jacobian. In particular, at least 25% of 
the Jacobian elements will be nonzero for any number of quadrotors. 

Table1 shows the times of running PIRK using this system on the four 
machines m4.10xlarge, c5.24xlarge, g3.4xlarge and p3.2xlarge in Amazon 
AWS. Due to the high density of this example, we focus on the memory-light 
growth bound and the Monte-Carlo methods. PIRK computed the reach sets 
of systems up to 120,000 state variables (i-e., 10,000 quadrotors). Up to 1,200 
states, all machines solve the problems in less than one second. Some of the 
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Table 1. Results for running PIRK to compute the reach set of the quadrotors swarm 
with artificial potential field. “N/M” means that the machine did not have enough 
memory to compute the reachable set. 


Method | No. of states | Memory (MB) | Time (seconds) 
m4.10xlarge | c5.24xlarge | g3.4xlarge | p3.2xlarge 

GB 1200 2.8 < 1.0 < 1.0 < 1.0 < 1.0 
GB 12000 275.3 < 1.0 < 1.0 < 1.0 < 1.0 
GB 120000 27,473.1 69.6 68.3 N/M N/M 

MC 1200 45.7 1.0 < 1.0 2.0 < 1.0 
MC 12000 457.5 56.8 23.7 233.1 40.6 

MC 120000 4577.6 > 2h 3091.8 N/M 5081.0 


machines lack the required memory to solve the problems requiring large mem- 
ory (e.g., 27.7GB of memory is required to compute the reach set of the system 
with 120,000 state variables using the growth bound method). 


5.4 Heat Diffusion 


The fourth test system is a model for the diffusion of heat in a 3-dimensional 
cube. The model is based on a benchmark used in [7] to test a method for 
numerical verification of affine systems. A model of the form « = f(t, x, p) which 
approximates the heat transfer through the cube according to the heat equation 
can be obtained by discretizing the cube into an £x £x £ grid, yielding a system 
with ¢° states. The temperature at each grid point is taken as a state variable. 
Each spatial derivative is replaced with a finite-difference approximation. Since 
the heat equation is a linear PDE, the discretized system is linear. 

We take a fixed state dimension of n = 10° by fixing £ = 1000. Integration 
takes place over [to, t1] = [0,20] with time step size h = 0.02. Using the Growth 
bound method, PIRK solves the problem on m4.10xlarge in 472 min, and in 
350.2 min on c5.24xlarge. This is faster than the time reported in [7] (30h) 
using the same machine. 


5.5 Overtaking Maneuver with a Single-Track Vehicle 


The remaining case studies focus on models of practical importance with low 
state dimension. Although PIRK is designed to perform well on high-dimensional 
systems, it is also effective at quickly computing reachable sets for low dimen- 
sional systems, for applications that require many reachable sets. The first such 
case study is single-track vehicle model with seven states, presented in [34]. 

We fix an input that performs a maneuver to overtake an obstacle in the 
middle lane of a 3-lane highway. To verify that the maneuver was safely com- 
pleted, we compute reachable sets over a range of points and ensuring that the 
reachable set does not intersect any obstacles. We consider a step-size of 0.005s 
in a time window between 0 and 6.5s. We compute one reachable set at each 
time step, resulting in a “reachable tube” comprising 1300 reachable sets. PIRK 
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Fig. 3. Reachable tube for the single-track vehicle. 


computed the reachable tube in 0.25s using the growth bound method on an i7 
CPU (Fig. 3). 


5.6 Performance on ARCH Benchmarks 


In order to compare PIRK’s performance to existing tools, we tested PIRK’s 
growth bound implementation on three systems from the ARCH-COMP’18 cat- 
egory report for systems with nonlinear dynamics [32]. This report contains 
benchmark data from several popular reachability analysis tools (C2E2, CORA, 
Flow*, Isabelle, SpaceEx, and SymReach) on nonlinear reachability problems 
with state dimensions between 2 and 12. 


Table 2. Results from running PIRK (growth bound method) to compute the reach 


sets for the examples reported in the ARCH-2018 competition. 
Benchmark model PIRK | CORA | CORA/SX | C2E2 | Flow* | Isabelle | SymReach 
Van der Pol (2 states) | 0.13 | 2.3 | 0.6 38.5 |1.5 |15 17.14 
Laub-Loomis (7 states) | 0.04 l 0.82 | 0.85 0.12 | 4.5 | 10 1.93 
Quadrotor (12 states) 0.01 | 5.2 | 1.5 = 5.9 | 30 2.96 


Table2 compares the computation times for PIRK on the three systems to 
those reported by other tools in [32]. All times are in seconds. PIRK ran on an i9 
CPU, while the others ran on i7 and i5: see [32] for more hardware details. PIRK 
solves each of the benchmark problems faster than the other tools. Both of the 
i7 and i9 processors used have 6 to 8 cores: the advantage of PIRK is its ability 
to utilize all available cores. 


6 Conclusion 


Using a simple parallelization of interval reachability analysis techniques, PIRK 
is able to compute reachable sets for nonlinear systems faster and at higher 
dimensions than many existing tools. This performance increase comes from 
PIRK’s ability to use massively parallel hardware such as GPUs and CPU clusters, 
as well as the use of parallelizable simulation-based methods. Future work will 
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focus on improving the memory-usage of the mixed monotonicity and Monte- 
Carlo based methods, including an investigation of adaptive sampling strategies, 
and on using PIRK as a helper tool to synthesize controllers for high-dimensional 
systems. 
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Abstract. Boolean networks (BNs) provide an effective modelling tool 
for various phenomena from science and engineering. Any long-term 
behaviour of a BN eventually converges to a so-called attractor. Depend- 
ing on various logical parameters, the structure and quality of attractors 
can undergo a significant change, known as a bifurcation. We present 
a tool for analysing bifurcations in asynchronous parametrised Boolean 
networks. To fight the state-space and parameter-space explosion prob- 
lem the tool uses a parallel semi-symbolic algorithm. 


Keywords: Boolean networks - Attractors - Bifurcation analysis 


1 Introduction 


Boolean networks (BNs) provide an effective mathematical tool to model compu- 
tational processes and other phenomena from science and engineering. BNs rep- 
resent a generalisation of other relevant mathematical models, which appeared 
previously as cellular automata (CA), suggested by Wolfram [39] for computa- 
tion modelling, or formal genetic nets [24] and Thomas networks [37], proposed 
for gene regulatory networks. This gives an idea of the versatility of BNs in dif- 
ferent applications (mathematics, physics chemistry, biology, ecology, etc.) and 
engineering (computation, artificial intelligence, electronics, circuits, etc.). 

The development of formal methods for analysis and synthesis of Boolean net- 
works has recently attracted a lot of attention [11,18,20,28,36]. In this paper, we 
are primarily interested in BN models for computational systems biology [29]. In 
general, biological processes are emerging from complex inter- and intra-cellular 
interactions and they cannot be sufficiently understood and controlled without 
the help of powerful computer-aided modelling and analysis methods [38]. BNs 
serve an important purpose of describing overall interactions within a living cell 
at an appropriate level of abstraction and they provide a systematic approach 
to model crucial states of cell dynamics — so-called phenotypes [22]. 
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The level of abstraction provided by BNs makes them an important tool for 
design of targetted therapeutic procedures such as cell reprogramming [36] based 
on changing one cell phenotype to another, allowing regeneration of tissues or 
neurons [21]. Since phenotypes are determined by long-term behaviour of bio- 
logical systems, fully automatised identification of phenotypes by employing BN 
models is a necessary step towards the future of modern medicine. Owing to the 
fact there is a continuous lack of sufficiently detailed (mechanistic) information 
on biological processes, there is definitely a need to work with models involving 
uncertain (or insufficient) knowledge. In this paper, we present a unique tool that 
makes a significant contribution towards fully automatised analysis of long-term 
behaviour of BN models with uncertain knowledge. 

We start with giving some intuition on BNs. A BN consists of a set of Boolean 
variables whose state is determined by other variables in the network through 
a set of Boolean update functions assigned to the variables (different update 
functions can be assigned to different variables) and regulations placed on them. 
If at each point of time all the update functions are applied simultaneously we 
speak about synchronous dynamics, if only one of the update functions is chosen 
non-deterministically to modify the corresponding Boolean variable, we speak 
of asynchronous dynamics. In this paper we consider asynchronous Boolean net- 
works only. 

In real-world applications, the update functions for some of the variables are 
typically (partially) unknown and are represented as logical parameters of the 
network. We speak of parametrised Boolean networks [40] in this case. If all the 
parameters are fixed to a concrete Boolean function, a parametrised BN turns 
into a (non-parametrised) BN. 

The long-term behaviour of a BN, starting from an initial state, has three 
possible outcomes. Briefly, the first situation is when the network evolves to 
a single stable state. Such states are the fixed points or point attractors or 
stable states. The second situation is that the network periodically oscillates 
through a finite sequence of states—an oscillating attractor or attractive cycle 
(the discrete equivalent of a limit cycle in continuous systems). The third case is 
what we call a disordered attractor (or chaotic oscillation [32]), an attractor that 
is neither stable not periodically oscillating and in which the system may behave 
unpredictably, due to the nondeterminism of the asynchronous semantics of BNs. 
Attractors are particularly relevant in the context of biological modelling as they 
are used to represent differentiated cellular types or tissues (in the case of fixed 
points) [2] and biological rhythms or oscillations (in the case of cycles) [17]. 

The set of network states that converge to the same attractor forms the 
basin of attraction of that attractor [7]. Attractors (and their basins) are dis- 
joint entities and the state space is compartmentalised by imaginary “attractor 
boundaries”. The entire dynamics of a Boolean network can be represented as a 
state transition system in which the trajectories from initial states are depicted, 
revealing the basins of attraction and associated attractors. We call such a rep- 
resentation the attractor landscape of the network [13]. 

In parametrised BNs the attractor landscape changes as the parameters are 
varied. Some of these changes may lead to a qualitatively different landscape 
(defined, e.g., in the count and/or quality of attractors). Such a qualitative 
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change is called a bifurcation and the values of parameters for which it occurs 
are called bifurcation points. Determining (all) bifurcation points for a network, 
called attractor bifurcation analysis, is an important task in the analysis of 
BNs [4]. 

While BN models are intuitive, mathematically simple to describe, and sup- 
ported by analytical methods [12], analysis of large models appearing in real cases 
is severely limited by the lack of robust computational tools running efficiently on 
high-performance hardware. Several computational tools have been developed for 
construction, visualisation and analysis of attractors in non-parametrised BNs. 
Amongst them, the established tools include ATLANTIS [34], Bio Model Ana- 
lyzer (BMA) [6], BoolNet [31], PyBoolNet [27], Inet [7], The Cell Collective [23], 
CellNet Analyzer [25], and ASSA-PBN [30]. Another group of existing tools tar- 
gets the parameter synthesis problem for parametrised BNs. The most prominent 
tools here are GRNMC [20], GINsim [10] (indirectly through NuSMV [14]), and 
TREMPPI [35]. In general, parameter synthesis tools can be used to identify 
parameters producing a specified long-term behaviour (depending on the logics 
employed), however, they do not provide a sufficient solution for identification 
and classification of all attractors in the system. Finally, it is worth noting that 
there have recently appeared several tools aiming at control of cell behaviour 
through BNs (i.e., driving a cell into the desired state). A well-known represen- 
tative of these tools is ViSiBooL [33]. 

To the best of our knowledge, none of the existing tools is capable of perform- 
ing attractor bifurcation analysis in parametrised models. Bifurcation analysis 
has been recently recognised as a fundamental approach that provides a new 
framework for understanding the behaviour of biological networks. The ability 
to make a dramatic change in system behaviour is often essential to organism 
function, and bifurcations are therefore ubiquitous in biological networks such 
as the switches of the cell cycle. The tool AEON is supposed to fill in the gap 
in the existing tools supporting analysis of Boolean network models. 

AEON builds on methods and algorithms for asynchronous parametrised BNs 
we have introduced in our previous research [1,3-5]. To deal with the state-space 
and parameter-space explosion problem, the tool implements a shared-memory 
parallel semi-symbolic algorithm. The results the tool provides to the user can 
be used for example to the design of “wet” experiments, better understanding 
of the system’s dynamics, or to control or re-program the system. As attractors 
model phenotypes, one of the most urgent needs for computer aided support, 
such as AEON can provide, is in applications in therapeutic innovations. 

We believe that attractor bifurcation computed by AEON will shift the cur- 
rent technology toward a comprehensive method when integrated with tools 
aimed at control or other analysis methods. 


2 Attractors in Parametrised Boolean Networks 


In this section, we define precisely the problem of attractor bifurcation analy- 
sis. We also give an overview of the necessary technical background needed to 
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describe the algorithmic solution and its implementation. More details can be 
found in [4]. 

A Boolean network (BN) consists of a finite set of state variables V (whose 
elements we denote by A, B, ...), a set of regulations R C V x V, and a family 
of Boolean update functions F = {Fy | A € V}. If (B,A) € R, we say that B is 
a regulator of A. For each A € V, we call the set C(A) = {B € V | (B,A) € R} of 
its regulators the context of A. A state of the BN is an assignment of Boolean 
values to the variables, i.e. a function V —> {0,1}. The type signature of each 
update function Fy is given by the context of A as Fy : {0,1}°“ — {0,1}. 

In Boolean networks, one often describes various properties of the network 
regulations. Here, we focus on three most basic types of regulation: We say that 
(A,B) € R is observable if there exists a state where changing the value of A 
also changes the value of Fp. In the tool, edges that might be non-observable are 
drawn using dashed lines. 

We say that a regulation (A,B) € R is activating if by increasing A, one 
cannot decrease the value of Fg. Symmetrically, the regulation is inhibiting if by 
increasing A, one cannot increase the value of Fx. In the tool, activating edges are 
denoted using green colour and sharp arrow tips, inhibiting edges are denoted 
using red colour and flat arrow tips, and edges that might be neither activating 
nor inhibiting are denoted using grey colour. 


RR co “— oD 


\ | Peg IT [ET | 
ge GD —aid ae ap | J 


Fig. 1. Illustration of a (parametrised) BN and its state transition graph. (Color figure 
online) 


Let us now consider an example of a BN with V = {A,B,C}, the regulations 
R as denoted in Fig. 1 (left) and the update functions: Fy = A V “BV —C, Fp = 
AVC, Fo = ~B. We can see that all regulations are observable and the colour 
(and shape) of the arrows respects the properties of activation and inhibition, 
e.g. (B, A) is an inhibition, because by increasing the value of B, we cannot increase 
the value of Fy. 

The semantics of a Boolean network is given as a directed state transition 
graph. The state space of the graph is the set of all possible assignments of 
Boolean values to the variables, i.e. {0,1}”. We consider the state of the Boolean 
network to evolve in an asynchronous manner, i.e. each variable is updated 
independently. We thus add a transition s — t if s Æ t and if there exists 
a variable A such that ¢(A) = F(s) and ¢(X) = s(X) for all X € V \ {A}. We 
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also use the notation —* to denote the reflexive and transitive closure of —> 
i.e. s >* t means that the state t is reachable from the state s. 

The semantics of the BN from our example is illustrated in Fig. 1 (middle). 
The states are represented as Boolean triples denoting the values assigned to the 
variables A, B, and C, respectively. 

The long-term behaviour that we are interested in is captured by the notion 
of attractors. In discrete-state systems, attractors are represented by terminal 
strongly connected components (TSCCs) of the graph. A TSCC is a maximal 
set of states S such that for all s, t € S, s —* t, and for all s € S, s — t implies 
te S. 

To classify the attractors of a given BN, we consider three primary kinds of 
long-term behaviour: 


— stability (©) We say that an attractor is stable, if it consists of a single state, 
in which the network stays forever. 

— oscillation (©) We consider an attractor to be oscillating if it is a single cycle 
of states. The size of such cyclic attractor is often referred to as its period. 

— disorder (=) Finally, an attractor is said to be disordered if it is neither stable 
nor oscillating. This means that although the network will stay in the attrac- 
tor forever, it will behave somewhat unpredictably due to nondeterminism. 


The long-term behaviour of a BN is then characterised by a multi-set over the 
universe of the three behaviours {©,©,—}. We call such multi-set a behaviour 
class and we denote the set of all possible behaviour classes €. In our example, 
the BN has only one attractor, and this attractor is stable; it consists of the 
single state 110, see Fig. 1 (middle). 

To deal with the fact that the update function family F might not be fully 
known, we extend the Boolean network with a set of logical parameters which 
determine the exact behaviour of each update function. These parameters have 
the form of uninterpreted Boolean functions, which can be used as part of the 
update functions’ description. 

Formally, we assume a finite set of parameter names R, whose elements we 
denote by P, Q, ...; we assume that every P € 8 has an associated arity ap 
meaning that P is an ap-ary uninterpreted function over Boolean values. Note 
that nullary uninterpreted functions are also allowed and can be seen as sim- 
ply Boolean parameters. We call an interpretation that assigns to each P € $ 
an ap-ary Boolean function a parametrisation. We usually work with a subset of 
parametrisations, called the valid parametrisations and denoted by P 

A parametrised Boolean network consists of a set of variables V, a set of reg- 
ulations R C V x V as in the non-parametrised case, a set of parameter names $f, 
its associated set of valid parametrisations P, and a family of parametrised update 
functions § = {Fy | A € V}. Each F, is written as a Boolean expression that 
may contain the uninterpreted functions of $. 

Let us now modify the previous example so that we view the BN from Fig. 1 
(left) as a parametrised one with the following update functions: Fy = AV-=BV-C, 
R= P(A,C), R= —B, where P is a parameter name with arity 2. The set 
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of valid parametrisations is constrained symbolically using the description of 
activations and inhibitions in Fig. 1 (left). In this case, there are only two possible 
parametrisations pı (denoted by @) and p> (denoted by ¢). The parametrisation 
pı assigns to P the function (x,y) > x V y, while po assigns to P the function 
(a, y) + xAy. Note that other assignments would violate the description, namely 
that both (A,B) and (C,B) are observable and activating. 

By fixing a concrete parametrisation p € P, we can interpret all the param- 
eter names and thus transform the parametrised update functions into non- 
parametrised ones, obtaining a (non-parametrised) BN, called the p-instantiation 
of the parametrised BN. We then generalise the definition of attractors to 
parametrised BNs, saying that a set of states S is an attractor in parametri- 
sation p € P if S is an attractor in the p-instantiation. 

The asynchronous semantics of a parametrised BN can be described using an 
edge-coloured state transition graph. The transitions of this graph are assigned 
a set of so-called colours—in our case, the colours correspond exactly to the 
parametrisations. The states are given as in the non-parametrised case. We then 
say that s — t if there exists a parametrisation p such that s — t in the p- 
instantiation. The set of colours of s — t is the set of all such parametrisations. 
In our example, the graph is depicted in Fig.1 (right; the edges are annotated 
with @, 4, or both). 


Problem Formulation. We now formulate the problem of attractor bifurcation 
analysis of parametrised BN as follows: Given a parametrised BN with a set of 
valid parametrisations P, compute the bifurcation function A : P — € that 
assigns to each parametrisation p the behaviour class of the p-instantiation of 
the given parametrised BN. 

In our example, the function A maps pı (@) to {©} (one stable attractor 
{110}) and p2 (¢) to {©} (one oscillating attractor {100, 101,111, 110}). 


3 Attractor Bifurcation Analysis with AEON 


The workflow of our approach, as implemented in the tool, is illustrated in Fig. 2. 
As an input, we take a parametrised BN including a graphical description of the 
regulations. The tool computes its asynchronous semantics as a symbolic edge- 
coloured graph represented using BDDs [8]. This is then used as an input to 
a parallel TSCC detecting algorithm based on [1], which extracts the attractors 
on the fly. Each attractor is classified as one of the three above-mentioned types 
and this information is used to incrementally build the bifurcation function A, 
also represented symbolically using BDDs. More details about the algorithm as 
well as the classification procedure can be found in [4]. 

The bifurcation function induces a partitioning of the parameter space in 
which two parametrisations are equivalent if their p-instantiations have the same 
behaviour class. This partitioning is presented to the user as a list of behaviour 
classes together with the cardinality of the respective parameter space partitions, 
see Fig.3. The user can select one of these classes and obtain a witness BN, 
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Fig. 2. The workflow of the AEON tool. 


i.e. a p-instantiation of the parametrised BN where p is one of the corresponding 
parametrisations. Finally, the tool also provides the whole bifurcation function 
encoded as BDDs—this output can be used for post-processing by further tools. 


4 Implementation 


The tool architecture consists of two components as seen in Fig. 4: the compute 
engine, and a web-based, user-facing GUI application (the client). The engine 
is responsible for the actual computation and acts as a web server to which the 
client establishes a connection. Using web-based GUI enables portability across 
different platforms, and the separation of the user interface from the compute 
engine enables the user to run the computation remotely on high-performance 
hardware. 
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Fig. 3. Screenshot of the tool displaying a parametrised BN together with the bifur- 
cation analysis results. 
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One of the responsibilities of the client is to provide a user friendly, multi- 
platform editor of parametrised BNs, since no popular BN editors currently 
support parameters. Architecturally, the client consists of several modules: 


— Live Model: In-memory representation of the currently displayed model. 

— Compute Engine Connection maintains the communication between the 
client and the compute engine. 

— Network Editor: An interactive drag-and-drop editor for drawing the struc- 
ture of the BN (variables, regulations). The implementation is based on the 
popular Cytoscape [19] library for graph visualisation and manipulation. 

— Parametrised BN Editor: The update functions can be modified in a sep- 
arate parametrised BN editor tab. This module is also responsible for basic 
integrity checks and static analysis of the BN, some of which is asynchronously 
deferred to the compute engine. 

— Import/Export facilitates serialisation and transfer of the BNs to other tools. 
We currently provide a compact text-based format specifically designed for 
AEON and a universally adopted XML-based SBML level 3 qual standard [9]. 
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Fig. 4. Overview of the tool architecture showing the main components of the GUI 
client and the compute engine. Arrows represent the general flow of information 
between individual components. 


The compute engine is written entirely in Rust to ensure fast and reliable 
operation (as well as easy portability). The functionality of the engine is split 
into separate libraries to allow later reuse: 


— lib-BDD: Our own robust, thread-safe, scalable Rust-based implementation 
of BDDs. 

— 1ib-PBN: A general purpose library for working with parametrised BNs. 
It provides serialisation to and from the AEON text format as well as 
SBML. Most importantly, it provides a parameter encoder that maps sets 
of parametrisations of the parametrised BN to BDDs. Using this encoder, 
the library implements an on-the-fly generation of the edge-coloured state 
transition graph corresponding to the asynchronous semantics of the given 
parametrised BN. 
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— TSCC Search algorithm implements the component search algorithm as pre- 
sented in [1]. The algorithm uses parallel reachability procedures as well as 
asynchronous processing of independent parts of the state space to fully 
utilise available CPUs and thus speed up the computation. The algorithm 
is extended with appropriate cancellation points so that the user can stop the 
computation when needed. 

— TSCC Classifier classifies and stores information about the discovered com- 
ponents. Specifically, for each non-empty behaviour class, we store a BDD 
representation of the parametrisations that result in this type of behaviour. 


Aside from the general overview of the tool, we would like to highlight two 
additional aspects of AEON: 


On-the-Fly Results: The attractors are discovered gradually. At any time during 
the computation the user may inspect the partial result, i.e. the bifurcation 
function computed so far. Although this is not the final outcome, such partial 
information can still prove useful, e.g. if unexpected attractor behaviour is found 
and the update functions of the model need to be adjusted. 


SBML with Parameters: In our implementation, when dealing with fully instan- 
tiated networks, we always output valid SBML. Unfortunately, the current 
SBML standard does not allow parameters or uninterpreted functions inside the 
update function terms. In fact, the update functions in SBML are represented 
using MathML! which in general allows arbitrary mathematical expressions, but 
its use in SBML is restricted. To export parametrised BNs, we intentionally dis- 
regard the restriction and our tool produces MathML formulae with parameters. 
Note that existing SBML implementations can be easily extended to also support 
parametrised BNs, since they already contain MathML parsers. 

Both the client? and the compute engine® are released as open source under 
the MIT License. Furthermore, an online version of the client is available at 
https: //biodivine.fi.muni.cz/aeon/, including links to pre-built binaries of the 
computation engine for all major OSes. 


5 Evaluation 


We evaluated the efficiency and applicability of AEON tool on a set of real 
biological models taken from the GINsim model database [10], ranging from 
small toy examples to large real world models. The experiments were performed 
on a 32-core AMD Ryzen workstation with 64GB of memory. All tested models 
are available in AEON source code repository (see footnote 3) as benchmark 
models. 


1 https: //www.w3.org/TR/MathML3/. 
? https: //github.com/sybila/biodivine-aeon-client. 
3 https: //github.com/sybila/biodivine-aeon-server. 
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The results are reported in Table 1. In general, the results show that the 
combination of symbolic representation of parametrisations and shared-memory 
parallel exploration of the state space allowed us to handle realistic BNs with 
large parameter spaces and non-trivial number of attractor bifurcations in rea- 
sonable time. Finally, let us note that the findings provided by AEON are in line 
with known properties of these biological models and even have a potential to 
provide new insights on the modelled biological processes. 


Table 1. The evaluation results. Number of classes refers to the number of distinct 
behaviour classes discovered by the algorithm. The times in the form minutes: seconds 
refer to total runtime on 1 and two 32 CPU cores respectively. 


Model name State Param. space size | No. of classes | Time Time 
space size (1cpu) | (32cpu) 

Asymmetric |25 ~2!8 11 0:05.62 | 0:03.39 

Cell Division 

Budding Yeast | 2° ~2!8 6 0:35.22 | 0:02.93 

(Orlando) 

TCR oo ~2'4 17 0:26.61/ 0:04.42 

Signalisation 

Drosophila gM ~236 8 27:48.1| 1:42.28 

Cell Cycle 

Fission Yeast | 21° ~231 201 25:20.9| 4:00.29 

Cell Cycle 

Mammalian | 21° ~2"4 176 38:39.6| 8:02.14 

Cell Cycle 

Budding Yeast | 2'8 ~2?6 7 Timeout | 52:28.1 

(Irons) 


In particular, in the case of the TCR Signalisation model, the authors have 
shown in [26] that their non-parametrised model produces seven possible stable 
states and one non-trivial attractor. By using AEON, we were able to confirm 
their findings as well as analyse a fully parametrised version of the model, finding 
sixteen other possible behaviours. Interestingly, in this model, all discovered 
seventeen behaviour classes consist of exactly eight attractors. 

For the Budding Yeast (Orlando) model [16], the authors state that for several 
different parametrisations, the model always reaches a stable state (based on 
simulation). Our analysis performed with AEON has confirmed that the original 
instantiation of the model has indeed a single stable attractor. Moreover, we have 
found that in the fully parametrised version of the model, almost ninety thousand 
instantiations have a single stable attractor. Additionally, we have also found 
there is almost an equal number of instantiations producing disordered attractors 
and also several oscillating attractors. AEON is capable to generate witnesses 
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for all of these situations thus opening the biological questions targeting the 
existence of the corresponding phenotypes in nature. 

The Fission Yeast Cell Cycle model [15] is known to contain one primary 
stable attractor as well as eleven artificial attractors. It is known that various 
multi-valued modifications of the original model exist that remove these arti- 
ficial stable attractors from the model while preserving the only single stable 
attractor [16]. By parametrising the model adequately and applying our method 
using AEON, we have discovered that a large portion of the parameter space of 
the model also produces a single stable attractor. 
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Abstract. Barrier certificates generation is widely used in verifying 
safety properties of hybrid systems because of the relatively low com- 
putational complexity it costs. Under sum of squares (SOS) relaxation, 
the problem of barrier certificate generation is equivalent to that of solv- 
ing a bilinear matrix inequality (BMI) with a particular type. The paper 
reveals the special feature of the problem, and adopts it to build a novel 
computational method. The proposed method introduces a sequential 
iterative scheme that is able to find analytical solutions, rather than 
the nonlinear solving procedure to produce numerical solutions used by 
general BMI solvers and thus is more efficient than them. In addition, 
different from popular LMI solving based methods, it does not make the 
verification conditions more conservative, and thus reduces the risk of 
missing feasible solutions. Benefitting from these two appealing features, 
it can produce barrier certificates not amenable to existing methods, 
which is supported by a complexity analysis as well as the experiment 
on some benchmarks. 
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1 Introduction 


Cyber-physical systems (CPS) consists of tightly coupled physical components 
such as electrical, mechanical, hydraulic, and biological components and software 
systems. They are deeply involved in many safety-critical systems, for example, 
high confidence medical devices, traffic control and safety systems, advanced 
automotive systems and critical infrastructure control systems. Safety verifica- 
tion helps to ensure them not to behave dangerously. 

Hybrid systems are popular models used in the verification of Cyber-physical 
systems, for its ability to describe interacting discrete transitions and continuous 
dynamics [18]. Safety verification contributes to checking safety properties by 
determining whether a system can evolve to some states violating desired safety 
properties when it starts at some initial conditions. A successful verification of 
a hybrid system can raise our confidence in its corresponding Cyber-physical 
system. 

For Cyber-physical systems with real time constraints, fast verification is 
a vital requirement. For example, a online verification module in a monitoring 
system should return the result before the deadline is reached. The paper aims at 
fast verification of hybrid systems to satisfy the requirement of fast verification 
of Cyber-physical systems. 

Intuitively, safety verification of hybrid systems can be performed by com- 
puting the reachable set. Reachable set computation based approaches explic- 
itly computes either exact or approximate reachable sets corresponding to the 
dynamics in the model, and then compares them with unsafe regions. It has been 
successfully adopted in verifying behaviors of a system within a finite horizon. 
However, due to their intrinsic computational difficulty, approaches of this kind 
can hardly scale up to complex non-linear systems. 

Many research efforts have been devoted to barrier certificate generation. A 
barrier certificate is a function, of which the zero level set separates the unsafe 
region from all reachable states of a system. It requires all system trajectories 
starting from some initial conditions fall into one side of the barrier certificate 
while the unsafe region resides on the other. As the existence of a barrier cer- 
tificate implies that the unsafe region is not reachable, the safety verification 
problem can be transformed into the problem of barrier certificate generation. 
Compared with reachable set computation [31], barrier certificate generation 
requires much less computation, since the unsafe region leads to seeking a bar- 
rier certificate. Especially, it behaves very well when a safety property concerns 
infinite time horizon [21,34]. 

Barrier certificate generation is a computation intensive task. A set of ver- 
ification conditions corresponding to a specific type of barrier certificates is 
given at first. Then they are encoded into some constraints on state variables 
and unknown coefficients of barrier certificates of a specific type. Finally, those 
unknown coefficients are determined by solving the constraints [27]. Thus, how 
to encode verification conditions and solve them in an effective way is a critical 
and challenging problem in barrier certificate based verification. 
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Acting as the barrier between reachable states and the unsafe region, a bar- 
rier certificate should always evaluate to be nonnegative or negative accordingly 
in spite of what type it is. To achieve this, the most popular computational 
method utilizes the theory of Putinar’s Positivstellensatz to derive a sum of 
squares (SOS) program of the barrier certificate, which results in a bilinear 
matriz inequality (BMI) solving problem belonging to the class of NP-hard 
problems [20,21]. An effective and efficient BMI solver is a prerequisite for suc- 
cess in exploiting SOS relaxation based methods. 

The general BMI problem can be solved by the commercial BMI solver 
PENBMI [14] at the cost of a very high computational complexity, where the 
(exterior) penalty and (interior) barrier method incorporates with the augmented 
Lagrangian method. To make it more tractable, the convex SOS relaxation based 
methods become popular. They transform the BMI problem (non-convex) to a 
linear matrix inequality (LMI) problem (convex) by fixing some multipliers and 
then solve it quickly via convex optimization such as semidefinite programming 
(SDP). Unfortunately, the removal of non-convexity may yield too conservative 
verification conditions so that the solution to the original BMI problem is invis- 
ible to the derived LMI problem. 

The paper focuses on quickly solving the BMI problem derived from SOS 
relaxation by directly attacking the problem without relaxing it to a LMI one. 
Taking advantage of the special feature of the problem, that is all bilinear terms 
are cross ones between different parameter vectors, a sequential iterative scheme 
is proposed. It treats the non-convex BMI problem directly so as to avoid the loss 
of precision accompanied with non-convexity removing. Meanwhile, it provides 
much lower computational complexity than the PENBMI solver. Hence, the 
proposed method spends much less time in computation and has the potential 
to find solutions beyond the reach of existing methods. 

To be specific, a feasible solution to the BMI problem can be found by a dual 
augmented Lagrangian iterative framework. At each iteration, the minimization 
over the four sets of primal variables is divided into four sequential minimization 
problems with respect to one set of primal variables by fixing the other three 
sets. On the theoretical side, we show that our method returns the feasible solu- 
tion in cubic time, while the PENBMI solver in quartic time. We have developed 
a prototyping tool implementing the proposed method and compared it with the 
PENBMI solver and the LMI solver: SOSTOOLS [22] over a set of benchmarks 
gathered from the literature. The experiment shows that our tool is more effec- 
tive than them and provides a much lower computational complexity than the 
PENBMI solver. 

The paper is organized as follows. Section 2 describes the connection between 
safety verification and barrier certificate generation. Section3 addresses how 
to transform the problem of barrier certificate generation into a BMI solving 
problem. In Sect.4, a sequential iterative scheme is presented followed by a 
complexity analysis. Section 5 contains detailed examples illustrating the use of 
our method as well as the experiment on benchmarks. We compare with related 
works in Sect. 6 before concluding in Sect. 7. 


A Novel Approach for Solving the BMI Problem 585 


2 Preliminaries 


Notations. Let R be the field of real number. R[x] denotes the polynomial ring 
with coefficients in R over variables x = [21,%2,+-- ,@n]7. Let Xx] c R[x] be 
the space of SOS polynomials. S” denotes the set of n x n symmetric matrices, 
and the notation B > 0 means that the matrix B € S” is positive semidefinite. 
(A, B) denotes the inner product between A and B. 
A continuous dynamical system is modeled by a finite number of first-order 
ordinary differential equations 
ž = f(x), (1) 


where x denotes the derivative of x with respect to the time variable t, and f(x) 
is called vector field f(x) = [f1(x),--+ , fn(x)]7 defined on an open set Y C R”. 
We assume that f satisfies the local Lipschitz condition, which ensures that given 
X = Xo, there exists a time T > 0 and a unique function 7 : [0, T) — R” such 
that 7(0) = xo. And x(t) is called a solution of (1) that starts at a certain initial 
state xo, that is, x(0) = xo. Namely, x(t) is also called a trajectory of (1) from 
Xo. 


Definition 1 (Continuous System). A continuous system over x consists of 
a tuple S : (O,f, V), wherein O C R” is a set of initial states, f is a vector field 
over the domain W C R”. 


A hybrid system is a system which exhibits mixed discrete-continuous behav- 
iors. A popular model for representing hybrid systems is hybrid automata [1], 
which combine finite state automata modeling the discrete dynamics, and dif- 
ferential equations modeling the continuous dynamics. 


Definition 2 (Hybrid Automata). A hybrid automaton is a tuple H : (L, 
X, F, Y, E, =, A, O, lo), where 


- L, a finite set of locations (or models); 

- X C R” is the continuous state space. The hybrid state space of the system 
is defined by X = L x X and a state is defined by (£, x) € X; 

- F : L — (R” — R”), assigns to each location £ € L a locally Lipschitz 
continuous vector field fe; 

- W assigns to each location € € L a location condition (location invari- 
ant) Y( CR”; 

- EC Lx Lis a finite set of discrete transitions; 

- & assigns to each transition e € E a switching guard Ze C R”; 

- A assigns to each transition e € E a reset function Ae : R” — R”; 

- OCR", an initial continuous state set; 

- lo E L, the initial location. The initial state space of the system is defined by 
Lo xO. 


Trajectories of hybrid systems combine continuous flows and discrete tran- 
sitions. Concretely, a trajectory of H is an infinite sequence of states o = 
{so, $1, S2, ++- } such that 
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— [Initiation] so = (0, Xo), with xo € O; 

Furthermore, for each pair of consecutive state (si, $;41) € o with s; = (€;,x;) 
and $41 = (€j41,X;+1) satisfies the following one of the two consecution 
conditions: 

— [Discrete Consecution] e = (¢;, 0:41) € E, x; € Ze and x41 = Ac(x:); 

— [Continuous Consecution] 4; = ¢;41, and there exists a time interval ô > 0 
such that the solution x(x;;t) to x = f, evolves from x; to x;41, while 
satisfying the location invariant W(¢;). Formally, x(x;,6) = x;,1 and Vt € 
(0, ô], x(x, t) < W(e;). 


If X is the set of all possible trajectories of H, the reachable set is defined by 
R= {s|3s € X: s € ç}, i.e., R contains all states that are elements of at least 
one trajectory ¢. 

In this paper, we focus on semi-algebraic hybrid systems, that is, the cor- 
responding vector fields are polynomials and the sets O, ¥ (4), Ze, A. in H are 
semi-algebraic, represented by polynomial equations and inequalities. The semi- 
algebraic sets ©, W(¢), Ze, and Ae in Definition 2 are represented as follows: 


O : = {x € R” | 0(x) > 0}, 


>0 
Ze : = {x € R” | pe(x) = Of, 
Ae : = {x' € R” | ôe(x') > 0}, 


where £ € L, e € E, 0(x), wWe(x), pe(x), and ĝe(x’') are vectors of polynomials, 
and the inequalities are satisfied entry-wise. Suppose that X, assigns to each 
location £ € L an unsafe region X,,(¢), defined by 


Xu(€) = {x € R” | &(x) > OF, 


where Cy is a vector of polynomials. The safety specification is described over 
the trace of state (€,x) w.r.t. unsafe regions X,,(@). 


Definition 3 (Safety). Given a hybrid system H : (L, X, F, Y, E, =, A, O, lo) 
and unsafe regions X,,(¢), the safety property holds if there exist no trajectories 
of H starting from the initial set lo x O, can evolve to any state specified by 
Xall), ie., WL E LYo € X.s € o HE s ¢ Xa (8). 


For safety verification of hybrid systems, the notion of barrier certificates [21] 
plays an important role. A barrier certificate maps all the states in the reachable 
set R to non-negative reals and all the states in the unsafe region to negative 
reals, thus can be employed to prove safety of hybrid systems. However, the 
exact reachable set R is usually intractable for most hybrid systems. In [21], a 
sufficient inductive condition for barrier certificates is defined as follows. 


Definition 4 (Barrier Certificate). A barrier certificate of hybrid system 
H for safety w.r.t. unsafe regions X, (£) is a set of real functions {B(x)} such 
that, for all L€ L and e = (£, l’) € E, the following conditions hold: 
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Vx €O: By, (x) > 0, 

Yx € W(0): Be(x) =O (4 (x), f, (x)) > 0, (2) 
Vx € Ze, VX’ € A-(x): Be(x) > 0 E Be (x’) > 0 

Yx € X,(0): B(x) < 0. 


Note that (2B. (x), fe(x)} is the Lie derivative of Be(x) with respect to the vector 
field f(x). 


3 Transfer to BMI 


The problem of generating barrier certificates in Definition 4 is an infinite- 
dimensional problem. In order to make it amenable to polynomial optimization, 
the barrier certificate {B¢(x)} should be restricted to a set of polynomials with 
a priori degree bound. Putinar’s Positivstellensatz provides a powerful represen- 
tation for polynomial positivity on semi-algebraic sets, which helps to transform 
the problem of barrier certificate generation into solving a semidefinite program- 
ming via SOS relaxation. 

Arising from the second and third conditions of Definition 4, where the 
parameters of {B(x)} appear on the antecedent sides, the associated SOS rep- 
resentations using Putinar’s Positvstellensatz form non-convex BMI constraints, 
yielded from the polynomial products between the barrier certificate and its 
polynomial multipliers. 

In what follows, the procedure for transforming barrier certificate generation 
into BMI solving is recapped in detail. Firstly, SOS relaxation is applied to 
encode the entailment checking in condition (2) as an SOS program. In fact, all 
the conditions of Definition 4 can be expressed as a unified type, say, a polynomial 
is nonnegative (positive) on a semi-algebraic set, which can be characterized by 
Putinar’s Positivstellensatz. 

Let K be a basic semi-algebraic set defined by: 


K = {x € R” | 9:(x) 2 0,...,9s(x) = 0}, (3) 


where g; € R[x],1 < j < s. Given the finite family g = {g1(x),...,9s(x)},the 
polynomial set defined by 


M(g) == {00 + X cigi | oi € Dix], 0< i< 5} 
i=1 
is called the quadratic module generated by g. 
Theorem 1. /Putinar’s Positivstellensatz] Let K C R[x] be as in (3). Assume 
that the quadratic module M (g) is archimedean, namely, there exists u(x) € 
M (g) such that the set {x € R"|u(x) > 0} is compact. If f(x) is strictly positive 
on K, then f(x) can be represented as 


F(x) = o0(x) + 3 ai(x)gi (x), (4) 


where o;, € X|x],0 <i <s. 
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Following Theorem 1, the existence of the representation (4) provides a suffi- 
cient and necessary condition of polynomial positivity on a semi-algebraic set K 
[23]. Although the number of auxiliary polynomials in the representation (4) is 
only one more than the number of polynomials that define K, the degree bound 
for o;(x) is exponential with n and deg(f). From a computational point of view, 
the method for finding the above representation has some degree of conserva- 
tiveness, say, by fixing a priori much smaller degree bound D for o;(x). Thus, 
a sufficient condition for the nonnegativity of the given polynomial f(x) on the 
semi-algebraic set K is provided as 


f(x) (9) + oa X) Gi» (5) 


with deg(o;) < D, o; € X|[x],1 < i < s. The representation (5) ensures that 
a polynomial is nonnegative on a given semi-algebraic set. At this point, all 
conditions in Definition 4 can be derived as a unified type, i.e., polynomial non- 
negativity on a semi-algebraic set. The representation (5) is used to characterize 
the conditions of barrier certificate generation, for they are more tractable. 


Theorem 2. Let the semi-algebraic hybrid system H and the unsafe regions 
X,(@) be defined as the above. Let D be a positive integer. Suppose there exist 
polynomials {B;(x)} and {ve(x)} with deg(ve) < D, positive numbers €¢, and 
€0,2, and vectors of sums of squares a(x), Ae,i(X), Ye(X), Ne(X), (x), p(x) with 
the degree bound D, such that the following expressions: 


are SOSes for each L € L and e € E. Then {Be(x)} satisfies the conditions in 
Definition 4, and therefore guarantees the safety of H. 


Remark that a polynomial f(x) with deg(f) = 2d is a sum of squares if and 
only if there exists a real symmetric and positive semidefinite matrix Q, called 
as the Gram matrix, such that f(x) = va(x)’Qva(x), where vq(x) is the vector 
consisting of all the monomials of degree less than or equal to d. In view of the 
conditions (6) in Theorem 2, the problem of generating the barrier certificates 
requires introducing the auxiliary (Gram matrices) variables. In fact, the decision 
variables in the SOS program (6) are the coefficients of all the unknown polyno- 
mials in (6), such as B(x), a(x), Ae(x) and the associated Gram matrices. The 
polynomial products, i.e., Be(x)ne(x) and Be(x)ve(x), derive some quadratic 
terms of the products of these unknown coefficients, which occur in the second 
and third constraints of (6). As a consequence, the problem for generating bar- 
rier certificates in Theorem 2 derives a non-convex BMI problem. We now show 
the transformation by a simple example. 


A Novel Approach for Solving the BMI Problem 589 


Example 1. Consider the system t = —a with location invariant ¥ = {x € 
R : z? —1 < 0}. Suppose the barrier certificate B(x) with deg(B) = 1, we 
predetermine its template as B(x) = ug + uz x with up, u, E€ R and u1 Æ 0. For 
simplicity, here we consider the second condition in Definition 4, that is, to find 
B(x) which satisfies 


OB 

H= — -(-2) > 0. 
Ox (=x) 2 

Following the SOS relaxation in (6), we need to find B(x) such that 


Yre: B(x) =0 


po(z) = = -(—2) — ¢1(x) -(1— 2°) — ¢2(x) : B(x) — € (7) 


and ¢1(x) are SOSes, ¢o(x) € Riz], € € Ryo. We assume that ġı = uz and ¢2 = 
v, with uz € Ryo and v € R. Then (7) yields ¢o(x) = ugx? — (uiv + u1)£ — uov 
uz — €, and its Gram matrix representation ¢9(x) = vı (£)? Qvı(x), where 


1 


=l Al 
Q= í u2 R zuı v a and vı (£) = H . 


gu1¥ gui Ug U U2 


Since ¢o(x) and ġı(x) must be SOSes, we have Q > 0 and u2 > 0, which is 
equivalent to 


U2 0 0 
1 1 
B(uo, u1, u2,0) = | 0 u2 —5u1 v — 5u | > 0. 
0 tuv tur Up U — uz — € 


Therefore, the requirement that ¢o(x) and ¢1(x) are SOSes is translated into 
the BMI constraint of the form 


2 2 
B= Boo + X` uBio + vBor + >. uv Bir = 0, (8) 
i=0 i=0 


where all B;,; € S° are constant matrices. 


As illustrated in Example 1, the problem of generating barrier certificates 
satisfying condition (6) can be transformed into a BMI problem of the form 


Find u € R?, ve R¢ 


P q P q 
s.t. B(u, v) = Boo + S (uBio + 5 v; Boj + 5 X uvjBij = 0, (9) 
i=1 j=1 


i=1 j=1 


where all B;;j € S* are constant matrices, u = [u1,... stig v = [v,..-- fu 
are parameter coefficients of the unknown polynomials occurring in the original 
SOS program. Essentially, the BMI problem (9) is NP-hard. To simplify the 
problem considerably, the canonical approach is to swap v, corresponding to 
the polynomial multipliers 7.(x) and ve(x), with the fixed vector. This strategy 
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can reduce the BMI constraint into the associated LMI one. Unfortunately, the 
resulting LMI problem is considerably more conservative than the original BMI 
one. To be specific, the fixed ne(x) and ve(x) may result in too conservative 
verification conditions that rule out barrier certificates satisfy the non-convex 
conditions but not the stronger convex conditions. 

By investigating (9), we can find a crucial feature of B(u, v), that is, all cross 
terms between parameters of u and v are of the form u; vj. The feature motivates 
us to design a more efficient approach for the specific type of BMI problems. 


4 A Sequential Iterative Scheme for Solving BMI 
Problems 


The conventional approaches for solving the BMI problem typically employ the 
augmented Lagrangian iterative framework, wherein each iteration involves two 
optimization problems for primal and dual variables. Due to the existence of 
nonlinear terms (quartic terms) in the associated Lagrangian function, the ana- 
lytical solutions to the first problem do not exist. The iterative-based nonlinear 
solving procedure is introduced to obtain the numerical solutions which results 
in a time-consuming computing process. 

Observing the BMI problem (9), we can see that all nonlinear terms are 
the cross terms between u and v. As a result, the associated dual augmented 
Lagrangian function is quartic for all variables, but is quadratic with respect 
to each single variable. Having this crucial feature, if we choose one variable 
as the independent variable and assign the others with fixed values, we may 
get the problem of minimizing the quadratic function. According to the first- 
order optimality condition, given a quadratic function f(x), the sufficient and 
necessary condition that X is a minimizer of f(x) requires that the gradient of 
f(x) to be zero at x, i.e., V f(x) = 0. As a consequence, the analytical solutions 
to our studied optimization problem can be easily formulated, since the gradient 
of the associated Lagrangian function is affine. 

The analytical optimal solutions can be obtained by calling simple matrix 
computation, and thus are much more efficient than numerical solutions whose 
computation relies on complicated nonlinear optimization methods. The com- 
putational advantage is further demonstrated by a complexity analysis of our 
scheme against the existing BMI solving algorithm that combines the (exterior) 
penalty and (interior) barrier method with the augmented Lagrangian method, 
presented later in this section. 

To utilize the computational advantage of analytical optimal solutions, for 
the first optimization problem (w.r.t primal variables) involved in each iteration 
of the augmented Lagrangian iterative framework, rather than using the usual 
joint minimization for all primal variables, we introduce a sequential minimiza- 
tion scheme, that is, dividing it into four sequential sub-optimization problems 
over one independent variable while keeping the others fixed. More concretely, 
the sub-optimization problem with one single primal variable is constructed by 
replacing the other variables with their optimal solutions obtained from the cur- 
rent iteration (if available) or the last iteration. 
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This section first introduces an iterative scheme to solve the BMI problem 
and then illustrates how to derive analytical solutions to the sub-problems in 
each iteration followed by a complexity analysis against the existing algorithm. 


4.1 An Iterative Scheme 


We start by presenting a straightforward reformulation of the BMI problem (9) 
as follows: 


A* = min À 
s.t. Z = A- I + B(u,v) (10) 
Z =~ 0. 


Clearly, there exists a feasible solution (u,v) to the BMI problem (9) if and only 
if the optimal value of problem (10) is non-positive, i.e., A* < 0. We try to build 
an iterative scheme for dealing with the optimization problem (10). 

The augmented Lagrangian function £ associated with (10) is defined as: 


1 
Ly(A, u,v, Z, U) =A+ (U, Z — AI — B(u, v)) + T Ar = Blu, v)||%, (11) 


where u > 0, (-,-) means the inner product operator, and ||- ||~ denotes the 
Frobenius norm of a matrix. Let U € St be the Lagrangian multiplier associated 
with the equality constraint, the dual function is defined as 


gU) = Ree Lyla, u, V, Z, U), 


and the Lagrange dual problem associated with (10) is to maximize this dual 
function g(U), i.e., max g(U). Clearly, the dual function yields lower bounds on 


the optimal value A* of the problem (10), that is, g(U) < A* for any U. 
Applying the dual ascent [17] to the augment Lagrangian function yields the 
iterative scheme, consisting of the following updates 


(AR alt Pet Zk+1) eo argmin L,(à, u,v, Z,U®), 
à,u,v,Z 
st. ZO, (12) 
(er) :— argmax L (AFT! uti yk+1 gk+1 U), 
U 


where the first step is the primal variables update, and the second step is the 
dual variable update. 

The first step in (12) consists of quartic terms and is lack of analytical solu- 
tion. Thus, it requires jointly minimizing £, (A, u, v, Z, U*) with respect to À, u, v 
and Z, which can be directly solved by applying the iterative-based nonlinear 
optimization procedure at the cost of a high computational complexity. Instead 
of the usual joint minimization solving, we separate the minimization over the 
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primal variables à, u, v, Z into four steps, that is, À, u, v and Z are updated in an 
alternating scheme, that is, minimizing £, with respect to one primal variable 
given the others fixed. In detail, the sequential iterative scheme consists of the 
following new iterations: 


AREL := argmin L(A, u*, vt, ZE, U®), (13) 
A 

ut! 2 argmin L (x " u, v, ZE, UF), (14) 

yer) i= argmin L (AFTI, utti, v, ZE, UF), (15) 

Ze) = argmin L OF art yk+i Z UF), (16) 
Z>=0 

UO = argmax L (AFt1, uktl ykti Zk+1 U), (17) 
U 


The above iterative scheme introduces a sequential minimization that treats 
the four primal variables one by one. Benefited from the fact that the explicit 
formulae for the minimizer or maximizer (13-17) are available, the analytical 
solutions can be directly derived. Furthermore, as the computation of those 
analytical solutions involves only simple matrix computation, such as eigenvalue 
decomposition and matrix inverse, it will be very efficient. 


4.2 Analytical Solutions for the Sequential Iteration 


In this subsection, we focus on how to find analytical solutions to problems 
(13-17) in terms of the first-order optimality conditions. 


Theorem 3. The minimizer \**+! of (13),i.e., 


AFHI :— argmin Lu v”, Z¥, U®), 
à 
has the following analytical formula: 


1x 
NH = Z DO (Zh — Bilut, v*)) + Ë - (RU) - 1), (18) 
i=l 


where Tr(U*) denotes the trace of U*. 


Proof. The first-order optimality condition for (13) is 


t 


t, 1 
Valy = 1—Tr(U*) + 7A = S 0 (25, — Bia(u*,v*)) = 0. 
w=1 


It follows that the specified \**++ in (18) is the optimal solution of (13), which 
concludes the proof. 
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The first-order optimality condition resembling Theorem 3 can also be 
invoked to produce the corresponding analytical solutions to (14) and (15), 
respectively. 


Theorem 4. Letv" = [v‘,..., uel? € R4, and define X"! = B, Biot, we Bie 
for0<i<p. Let ut! be the minimizer of (14). Then 


a 6 E (19) 
where S = [s;;] E€ R?*? with sij = (xl, xb), and 


1 
ri = (UF + —(2* =e 7 — X), xX) 1 <i <p. 
H 


Proof. The first-order optimality condition for (14) is 
Valin: u, v“, a, U*) = (Vu: £n, Vaz ka , Vap La)” =, 
and the i-th gradient function Vu,£,(A*t!, u, vt, Z*,U*), 1< i< pis 
q q 


; 1 
= XO výBie = Bio) + a sa HOT Ta B( (u, ve oo ig T io) 
£=1 


Then we have 


: 1 F 
Vu; La (AF t, u, vë, Z*,U*) = (U*, -X 1) 4 nical TL — Bu, v*), —X") 


fori =1...,p. 
Thus, Vul,,(A**!, u,v*, Z*,U*) = 0 yields (19), which proves the claim. 


TF 5. Let ut! = [u}tt,... uk t!]T € RP, and define YY = Boj + 
pak ukt "Boj, for0<j <q. Let v'*" be the minimizer of (15). Then 


ya alii, wes ett (20) 


where T = [tij] € RIX with tij = (yl, yl), and 


1 l 
wi = (UF Bea hav) ye). 1<i<q. 
H 


Proof. Similar to the proof of Theorem 4. 


The theorems below demonstrate the analytical solutions to the Z- 
minimization and U-maximization, respectively. 


Theorem 6. Let Z*+! be the minimizer of (16), and U**" be the solution of 
(17). Denote by P+! the matriz P+! := \*+17+B(u*t!, v+!) —puU*. Suppose 
P+! — QYQ? is a spectral decomposition, namely, 


peti QXQT = [Q; Q:| be ; | ae 
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where X4 and Q; are the nonnegative eigenvalues and the associated orthogonal 
eigenvectors, while X— and Q; are the negative eigenvalues and the associated 
orthogonal eigenvectors. Then we have 


Ze = QQ? (21) 
ttt = -1Q,5_QF. (22) 
Proof. The first-order optimality condition for (16) is 
Vz Ah ut eZ. (23) 
In view of the terms of (23), the problem (16) is translated to 


Z*+1 = argmin || Z — A**17 — B(ut+t, vet) + UFI, (24) 
Z>~-0 


which reads as 
ZE+I = argmin ||Z — P+ 2. 
Z>-0 


According to the spectral decomposition of P*+!, the result (21) immediately 
follows. 
From (17), we have 


. 1 
yeti Pe uk 4 iCal detiy Boa) 


— l peti _ pet 
=—-(Z per). 
u 


which yields the result (22). 


4.3 Algorithm and Complexity Analysis 


From the above observation in Sect. 4.1 and Sect. 4.2, the detailed procedure for 
the sequential iterative scheme is summarized in Algorithm 1. 


Remark 1. At the beginning of Algorithm 1, u? € R?, v? € R3 are selected 
randomly, Z? = Mo - Mo where Mọ € Rt is chosen randomly, and heuristically 
U? = ô. I, with ô > 0, 


Remark 2. There are several options for the stopping criterion of the loop in 
Algorithm 1. That is, Algorithm 1 will stop and return the current result when 
one of the following cases occurs: 


~ [AEFI = A*| < €, 
= | A" Z*|| < e, 


where e€ is a given tolerance. A reasonable value for the stopping criterion might 
be e = 107ê. 
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Algorithm 1: Sequential Iterative Scheme for solving a BMI 
(SISBMI) 

Input: Problem (9); initial values u°, v°, Z° and U°. 

Output: A feasible solution (u*, v*) of (9). 

while stopping criterion not met do 


Compute A*t" according (18); 
k+l 


1 
2 
3 Compute u**? and v**t according to (19) and (20), respectively; 
4 BP+? — B(ukt1 yet), 

5 Get the minimal eigenvalue of B**', denoted by Â; 
6 

7 

8 


if \>0 then 
i (u*,v*) = (utt, vti); 


return (u*, v*); 


Ke) 


Compute Z*7 
10 Compute U**? according to (22). 


according to (21); 


Complexity Analysis 

We analyze the complexity of Algorithm 1 and further compare it with the algo- 
rithm in PENBMI solver [14], which combines the (exterior) penalty and (inte- 
rior) barrier method with the augmented Lagrangian method. The BMI problem 
we study corresponds to a nonconvex optimization problem with quartic terms. 
For the BMI problems of the special form, neither of the two algorithms can 
guarantee to converge. A complete complexity analysis is not available as the 
number of iterations is not predictable. Therefore, the computational complex- 
ity of one iteration becomes a safe baseline for performance evaluation. In this 
paper, we follow the same complexity analysis as that in [14], i.e. analyzing the 
complexity in one iteration. 

Recall that the dimension of the matrix B(u, v) in (9) is t, and the numbers 
of variables u and v are p and q, respectively. We see that each iteration in 
Algorithm 1 can be divided into five steps. Firstly, the step of updating A costs 
O(t) flops, which is carried out by 3t + 3 adds. In the step of u—update, the 
complexity is clearly dominated by the computation of the inverse of Ay € 
R?*?, which costs O(p?) flops [5]. Analogously, v—update can be done in O(q?) 
flops. In the step of Z—update, the critical issue is to compute the eigenvalue 
decomposition of matrix V*t! € R‘**, at a cost of about $t flops. So the step 
of Z—update requires O(t?) flops. Finally, the step of U—update requires about 
O(t) flops by performing U*+?. 

Now, the complexity for the above steps in each iteration of Algorithm 1 is 
summarized as follows: 


— Calculation of A > O(t); 
— Calculation of u > O(p?); 

Calculation of v — O(q°); 
Calculation of Z > O(t?); 
Calculation of U — O(t). 


| 


| 
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The total cost of each iteration in Algorithm 1 is then O(p* + q? + t°), while 
the cost of the algorithm adopted in PENBMI is approximately O((p + q)t? + 
(p + q)?t? + (p + q)3), as shown in [14]. Assume that p,q and t are bounded by 
T € Z, ie., T = max{p,q,t}, the complexity of Algorithm 1 is approximately 
O(T*?), whereas the complexity of PENBMI is approximately O(T*). 


5 Experiments 


In this section, we first show our method by verifying a nonlinear continuous 
system and then compare our Sequential Iterative Scheme tool: SISBMI solver 
with the other two solvers: PENBMI and SOSTOOLS. 


Example 2. Consider the following nonlinear continuous system [28] 


£i 10(z2 = £1) 
LQ = zı(28 — x3) S09 
£3 tit? — S23 


with the location invariant 
Ww = {x € R? | — 20 < 21,23 < 20, —20 < a2 < 0}. 


It is required to verify that all trajectories of the system starting from the initial 
set 
O = {x € R? | (x1 + 14.5)? + (x2 + 14.5)? + (xg — 12.5)? < 16} 


will never enter the unsafe region 
Xu = {x € R? | (x1 + 16.5)? + (x2 + 14.5)? + (£3 — 2.5)? < 38.44}. 


It suffices to find a barrier certificate B(x), which satisfies all the conditions in 
Definition 3. Suppose that the degree of B(x) is 4, and the degree bound D = 6. 
Firstly, we construct a bilinear SOS program (6), which is further transformed 
into a BMI problem of the form (9) where the dimension of B(u,v) is 78, and 
the number of decision variables is 396. By applying our algorithm, we succeed 
to solve the BMI problem and obtain the following barrier certificate 


B(x) = —0.0020a} — 0.001343 — 0.0131x7x3 — 0.0022r1 2273 +- - - + 0.0938a1 + 62.5702. 
dl 


N. 


28 terms 


As shown in Fig. 1, the zero level set of the barrier certificate B(x) (the 
steelblue surface) separates Xu (the red ball) from all trajectories starting from 
O (the green ball). Therefore, the safety of the above system is verified. 

Alternatively, by applying the PENBMI solver to compute the solution of 
the problem (9), we cannot find barrier certificates with degree less than 6. 
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x2 


Fig. 1. Phase portrait of the system in Example 2. (Color figure online) 


Example 3. Consider the following hybrid system [20] depicted in Fig. 2, where 


—2x9 —X 
fi = —@1 + T3 » h= —t1 + T3 
x1 + (2x2 + 3x3)(1 + x3) —21 — 2x9 — 3x3 


0.99 < x} + 0.01x3 + 0.01x3 < 1.01 


NO CONTROL CONTROL 


x = f(x, d) 


x = fi (x, d) 


OO ew 
x] + X5 + x3 2 0.03 


x} + 0.013 + 0.01x} < 1.01 
3 2 2 
x, $5.1 


0.03 < x} +x} +x} < 0.05 


Fig. 2. The hybrid automata of the system in Example 3 


The system starts in location 44 with the initial set 
O = {x E€ R? : x + 23 + 23 < 0.01}. 
Our task is to verify that the system will never enter the unsafe set 
Xullo) = {x E R? :5 < zı < 5.1}. 


Applying our SISBMI solver, we obtain the polynomial barrier certificate 
with degree 4: 
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Bo, (x) = 0.05512} + 0.039223 + 0.007923 + 0.0696a722 +--+. — 1.11342 + 2.701, 
1 al 3 3 
A Imaal 


35 terms 


Bo, (x) = 0.0273214 + 0.0541złz2 — 1.098a123 — 0.521r1x2x3 +- — 2.725a1 + 8.197. 
aaa aaa 


35 terms 


Our SISBMI solver was implemented in Matlab (2018b), and was compared 
with two solvers PENBMI and SOSTOOLS over a set of benchmarks in the 
literature on barrier certificates generation. Among these benchmark examples, 
examples C1-C15 are semi-algebraic continuous systems and examples H1-H7 
are semi-algebraic hybrid systems. The performance is reported in Table 1. All 
the experiments were performed on 2.6 GHz Intel i5 processor under Windows 
10 with 8 GB RAM. 


Table 1. Algorithm performance on benchmarks 


ID n | |L| lde [BMI LMI 
t |N |SISBMI PENBMI [SOSTOOLS 
ds|Is |T; Fara dlt, 

Cl from [33] | 2/1 l3 21) + 33/2 | 32| 0.21892 24] 0.91982 | 0.1949 
C2 from [24] | 2/1 [1 |30) 58l4 | 73| 0.5475 — — 

C3 from [21] | 2/1 l3 | 21) 39/2 | 29] 0.2761 2 |22| 1.3353|— 

C4 from [30] | 3/1 l2 |32| 72/2 | 44| 0.4126 2 |23) 1.8237|2 | 0.3245 
C5 from [26] | 3/1 l3 |32| 72/2 | 47| 0.47612 28| 1.54352 | 0.3362 
C6 from [3] 31 l2 |78| 396/4 | 834.3598 — = 

C7 from [28] 4/1 [3 | 50) 145/2 | 72| 3.95772 28| 21.0502|2 | 3.8658 
C8 from [9] | 3/1 [2 | 32) 72— 2 40| 2.4555 — 

C9 from [6] | 4/1 l2 | 31) 86 — 2 42| 4.6909 — 

C10 from [13] 71 |2 | 73| 394/2 |112|10.7156 2 |44108.5615|2 | 7.2807 
C11 from [13] 91 l2 (102| 908/2 |264/20.6856 2 |301272.4551|2 |15.8167 
C12 from [8] 1241 lı | 70) 123/2 108} 3.2712 — = 

H1 from [25] 2/2 |2 | 38) 65/2 |61| 0.48992 25| 2.1499|2 | 0.2074 
H2 from [36] 2/2 |3 | 42) 69/2 | 77| 0.6331/2 24| 2.2786]2 | 0.2265 
H3 from [15] 22 |2 | 75] 138/2 [115] 3.7394 — = 

H4 from [2] 2/3. fı |42| 89/1 | 70| 0.53261 21| 0.9968|2 | 0.1856 
H5 from [1] 3/3. fa 67) 6aļ2 [112] 1.0864 — = 

H6 from [7] 46 [2 840|2736|2 |616|48.0548 — = 

H7 from [20] 3/2 3170| 899/4 |219/18.7912/4 (32/243.9832|— 
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In Table 1, n denotes the number of the system variables, and |L| denotes the 
number of locations; dg denotes the maximal degree of the polynomials in the 
vector fields; t is the dimension of the matrix B(u, v), and N refers to the number 
of decision variables appearing in the BMI problem (9), namely, dim(u)+dim(v); 
ds, dp and dı denote the degrees of the barrier certificates obtained via SISBMI, 
PENBMI and SOSTOOLS, respectively; Is and J, are the numbers of iterations 
used by SISBMI and PENBMI, respectively; Ts, Tp and T; record the time spent 
by computation in seconds; the symbol—means that the solver was unable to 
return a feasible solution with the degree bound deg(B) < 6. 

Table 1 shows that for the 19 examples, our SISBMI solver can successfully 
handle 17 of them while the numbers of successful examples of PENBMI and 
SOSTOOLS are 13 and 9, respectively. Our SISBMI solver seems to provide the 
best solving capability. There are 10 examples that can be treated by BMI solvers 
(either SISBMI or PENBMI) unable to be solved by the LMI solver SOSTOOLS 
due to the more conservative conditions in the corresponding LMI problems. To 
evaluate the best performance of SOSTOOLS, we have tried some widely used 
multipliers [16,20], such as 0, (1+a7+---+22), as well as some polynomial 
multipliers with random coefficients and the prior degree bound that guarantee 
the degrees of the polynomials involved in the verification conditions (6) do not 
increase. Examples C8-C9 show the case where the solver PENBMI performs 
better than our SISBMI solver as a result of the fact that both SISBMI and 
PENBMI solvers only find local optimal solutions to the BMI problems. 

The above analysis on effectiveness can also be used to support that our SIS- 
BMI solver is a necessary complement to the existing tools. As shown in Table 1, 
PENBMI solver can cover 13 examples. To solve the remaining 6 examples, it 
has to resort to the SISBMI solver. 

Considering the efficiency, the solver SOSTOOLS performs the best for 
almost all the successful examples because of the much lower computational 
complexity for solving the relaxed LMI problems. The efficiency comparison 
between SISBMI and PENBMI solvers can be made by examining the ratio 
between the execution times of these two solvers in Table 1. For the 11 examples 
that are solved by both tools, on average, our SISBMI solver costs 3.4 times than 
PENBMI solver in the number of iterations while only costs 0.27 times than 
PENBMI solver in time. That is for all the successful examples, our SISBMI 
solver takes much less time than PENBMI solver even it spends more iterations, 
which complies with the complexity analysis of the underlying algorithms. Both 
the theoretical analysis and the experiments support that our SISBMI solver is 
more efficient than PENBMI solver. 


ji 


6 Related Work 


In theory, the problem of barrier certificate generation is a quantifier elimination 
problem. The verification conditions corresponding to a barrier certificate can 
be encoded into a set of constraints on state variables and coefficients where 
the unknown coefficients are existentially quantified and state variables are uni- 
versally quantified. Hence, several symbolic computation approaches [11, 19, 29], 
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such as cylindrical algebraic decomposition (CAD) or Gronber bases computa- 
tion, have been directly applied to attack the associated quantifier elimination 
problems. However, due to the high computational complexity, they suffer from 
the scalability problem. 

Due to the relatively low computational complexity, SOS relaxation based 
methods become popular. Rather than directly handling quantified constraints, 
they transform them to a non-convex bilinear matrix inequality. Z. Yang et al. 
[35] relied on the BMI solver PENBMI to compute exact polynomial barrier 
certificates. O. Bouissou et al. [3] applied interval analysis to handle the BMI 
problem derived from the dynamical systems whose initial and unsafe regions 
are restricted to the box form. G. Jessica et al. [10] presented an augmented 
Lagrangian framework for the special case of bilinear programs that arise from 
data flow constraints and correspond to the construction of numerical abstract 
domains aiming at safety verification. 

To alleviate its computational intractability, a convex surrogate has been 
proposed that behaves fairly well. Specifically, once the multipliers are fixed, 
the BMI problem is further transformed into a LMI problem that can be quickly 
solved by convex optimization. S. Prajna et al. [20] had first put the idea forward. 
A. Sogokon et al. [34] employed the comparison principle associated with the 
convex verification conditions, to generate vector barrier certificates in safety 
verification. 

Inspired by the fact that it is the non-convex feature of verification condi- 
tions prevents well-developed convex optimization to be directly applied, many 
convex but stronger verification conditions are studied. H. Kong et al. [16] pro- 
posed an exponential condition for semi-algebraic hybrid systems. Kapinski et 
al. [12] diagnosed convex verification conditions to Lyapunov-based barrier cer- 
tificates. C. Sloth et al. [32] considered convex barrier certificates associated with 
compositional conditions for a group of interconnected hybrid systems. L. Dai 
et al. [4] studied how to balance the convexity of verification conditions with the 
expressiveness of barrier certificates. All these convex verification conditions are 
equivalent forms of LMI problems. They facilitate problem-solving at the risk of 
losing feasible solutions. 


7 Conclusion 


We have presented a sequential iterative scheme for solving the BMI problem 
derived from the barrier certificate generation of semi-algebraic hybrid systems. 
Taking advantage of the special feature of the bilinear terms, the proposed app- 
roach is more efficient than the existing BMI solver. Furthermore, compared 
with popular LMI solving based methods, the solving procedure does not make 
the verification condition more conservative, and thus reduces the risk of miss- 
ing solutions. In virtue of the two appealing features, our approach can produce 
barrier certificates not amenable to existing methods, which is evidenced by a 
theoretical complexity analysis as well as the experiment on some benchmarks. 
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Abstract. In this paper, we study efficient approaches to reachability 
analysis for discrete-time nonlinear dynamical systems when the depen- 
dencies among the variables of the system have low treewidth. Reach- 
ability analysis over nonlinear dynamical systems asks if a given set of 
target states can be reached, starting from an initial set of states. This is 
solved by computing conservative over approximations of the reachable 
set using abstract domains to represent these approximations. However, 
most approaches must tradeoff the level of conservatism against the cost 
of performing analysis, especially when the number of system variables 
increases. This makes reachability analysis challenging for nonlinear sys- 
tems with a large number of state variables. Our approach works by con- 
structing a dependency graph among the variables of the system. The 
tree decomposition of this graph builds a tree wherein each node of the 
tree is labeled with subsets of the state variables of the system. Further- 
more, the tree decomposition satisfies important structural properties. 
Using the tree decomposition, our approach abstracts a set of states 
of the high dimensional system into a tree of sets of lower dimensional 
projections of this state. We derive various properties of this abstract 
domain, including conditions under which the original high dimensional 
set can be fully recovered from its low dimensional projections. Next, 
we use ideas from message passing developed originally for belief propa- 
gation over Bayesian networks to perform reachability analysis over the 
full state space in an efficient manner. We illustrate our approach on 
some interesting nonlinear systems with low treewidth to demonstrate 
the advantages of our approach. 


1 Introduction 


Reachability analysis asks whether a target set of states is reachable over a 
finite or infinite time horizon, starting from an initial set for a dynamical sys- 
tem. This problem is fundamental to the verification of systems, and is known to 
be challenging for a wide variety of models. This includes cyber-physical systems, 
physical and biological processes. In this paper, we study reachability analysis 
algorithms for nonlinear, discrete-time dynamical systems. The key challenge in 
analyzing such systems arises from the difficulty of representing the reachable 
sets of these systems. As a result, we resort to over-approximations of reach- 
able sets using tractable set representations such as intervals [16], ellipsoids, 
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polyhedra [19], and low degree semi-algebraic sets [2]. Whereas these represen- 
tations are useful for reachability analysis, they also trade off the degree of over- 
approximation in representing various sets against the complexity of performing 
operations such as intersections, unions, projections and image computations 
over these sets. The theory of abstract interpretation allows us to design various 
abstract domains that serve as representations for sets of states in order explore 
these tradeoffs [17, 18,34]. However, for nonlinear dynamical systems, these rep- 
resentations often become too conservative or too expensive as the number of 
state variables grow. 

In this paper, we study reachability analysis using the idea of tree decompo- 
sitions over the dependency graph of a dynamical system. Tree decompositions 
are a well-known idea from graph theory [37], used to study properties of various 
types of graphs. The treewidth of a graph is an intrinsic property of a graph that 
relates to how “far away” a given graph is from a tree. For instance, trees are 
defined to have a treewidth of 1. Many commonly occurring families of graphs 
such as series-parallel graphs have treewidth 2 and so on. Formally, a tree decom- 
position of a graph is a tree whose nodes are associated with subsets of vertices 
of the original graph along with some key conditions that will be described in 
Sect. 2. We use tree decompositions to build an abstract domain. The abstraction 
operation projects a set of states in the full system state space along each of the 
nodes of the tree, yielding various projections of this set. The concretization com- 
bines projections back into the high dimensional set. We study various properties 
of this abstract domain. First, we characterize abstract elements that can poten- 
tially be generated by projecting some concrete elements along the nodes of the 
tree (so called canonical elements, Definition 10). Next we characterize those 
sets which can be abstracted along the tree decomposition and reconstructed 
without any loss in information (tree decomposable sets, Definition 11). In this 
process, we also derive a message passing approach wherein nodes of the tree can 
exchange information to help refine sets of states in a sound manner. However, 
as we will demonstrate, the abstraction is “lossy” in general since projections of 
tree decomposable sets are not necessarily tree decomposable. We discuss some 
interesting ways in which precision can be regained by carefully analyzing this 
situation. 

We combine these ideas together into an approach for reachability analysis of 
nonlinear systems using a grid domain that represents complex non convex sets 
as a union of fixed size cells using a gridding of the state-space. Although such 
a domain would be prohibitively expensive, we show that the tree decomposi- 
tion abstract domain can drastically cut down on the complexity of computing 
reachable set overapproximations in this domain, yielding precise reachable set 
estimation for some nonlinear systems with low treewidth. We demonstrate our 
approach using a prototype implementation to show that for a restricted class of 
systems whose dependency graphs have low treewidth, our approach can be quite 
efficient and precise at the same time. Although some interesting systems have 
low treewidth property, it is easy to see that many systems will have treewidths 
that are too high for our approach. Our future work will consider how systems 
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whose dependency graphs do not have sufficiently low treewidth can still be 
tackled in a conservative manner using some ideas from this paper. 


1.1 Related Work 


As mentioned earlier, the concept of tree decompositions and treewidth origi- 
nated in graph theory [37]. The concept of treewidth gained popularity when 
it was shown that many NP-complete problems on graphs such as graph col- 
oring could be solved efficiently for graphs with small treewidths [5]. Courcelle 
showed that the problem of checking if a given graph satisfies a formula in the 
monadic second order logic of graphs can be solved in linear time on graphs with 
bounded treewidth [15]. Several NP-complete problems such as 3-coloring can be 
expressed in this logic. Tree decompositions are also used to solve inference prob- 
lems over Bayesian networks leading to representations of the Bayesian networks 
such as junction trees that share many of the properties of a tree decomposi- 
tion [29]. In fact, belief propagation over junction trees is performed by passing 
messages that marginalize the probability distributions at various nodes of the 
tree. This is analogous to the message passing approach described here. 

Tree decomposition techniques have been applied to model checking prob- 
lems over finite state systems. For instance, Obdržálek show that the p-calculus 
model checking problem can be solved in linear time in the size of a finite-state 
system whose graph has a bounded treewidth [35]. However, as Ferrara et al. 
point out, requiring the state graph of a system to have a bounded treewidth is 
often restrictive [24]. Instead, they study concurrent finite state systems wherein 
the communication graph has a bounded tree width. However, they conclude 
that while it is more reasonable to assume that the communication graph has a 
bounded tree width, it does not confer much advantages to verification problems. 
For instance, they show that the unrolling of these systems over time potentially 
results in unbounded treewidth. In this paper, we consider a different approach 
wherein we study the treewidth of dependency graphs of the system. We find 
that many systems have small treewidth and exploit this property. At the same 
time, we note that some of the benchmarks studied have “sparse” dependency 
graphs but treewidths that are too large for our approach. 

Tree decomposition techniques have also been studied in static analysis of 
programs. The control and data flow graphs of structured programs without 
goto-statements or exceptional control flow are known to have small treewidth 
that can be exploited to perform compiler optimizations such as register allo- 
cation quite efficiently [38]. Chatterjee et al. have shown how to exploit small 
treewidth property of the control flow graphs of procedures in programs to per- 
form interprocedural dataflow analysis by modeling the execution of programs 
with procedures as recursive state machines [11]. However, this approach seems 
restricted to control dominated properties such as sequence of function calls. In 
a followup work, they study control and data flow analysis problems for concur- 
rent systems, wherein each component has constant treewidth [10]. In contrast, 
our approach studies dynamical system and consider tree decompositions of the 
data dependency graph. 
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The use of message passing in this paper closely resembles past work by 
Gulwani and Jojic [27]. Therein, a program verification problem involving the 
verification pre/post and intermediate assertions in a program is solved by pass- 
ing messages that can propagate information between assertions along program 
paths in a randomized fashion. The approach is shown to be similar to loopy 
belief propagation used in Bayesian inference. The key differences are (a) we 
use data dependencies and tree decompositions rather than control flow paths 
to pass information along; and (b) we formally prove properties of the message 
passing algorithm. 

Our approach is conceptually related to a well-known idea of speeding up 
static analysis of large programs using “packing” of program variables [4,28]. 
This approach was used successfully in the Astreé static analyzer [3,4,21]. 
Therein, clusters of variables representing small sets of dependent local and 
global are extracted. The remaining program variables are abstracted away and 
the abstract interpretation process is carried out over just these variables. The 
usefulness of this approach has borne out in other abstract interpretation efforts, 
including Varvel [28]. The key idea in this paper can be seen as a formalization of 
the rather informal “clustering” approach using tree decompositions. We demon- 
strate theoretical properties as well as the ability to pass messages to improve 
the results of the abstract interpretation. 

The use of the dependency graph structure to speed up reachability analysis 
approaches has been explored in the past for speeding up Hamilton-Jacobi-based 
approaches by Mo Chen et al. [12] as well as flowpipe based approaches by 
Xin Chen et al. [13]. Both approaches consider the directed dependency graph 
wherein x; is connected to x; if the former appears in the dynamical update 
equation of the latter variable. The approaches perform a strongly connected 
component (SCC) decomposition and analyze each SCC in a topological sorted 
order. However, this approach breaks as soon as the system has large SCCs, 
which is common. As a result, Xin Chen et al. show how SCCs can themselves 
be broken into numerous subsets at the cost of a more conservative solution. 
In contrast, the tree decomposition approach can be applied to exploit sparsity 
even when the entire dependency graph is a single SCC. 


2 Preliminaries 


In this section, we will describe the system model under analysis, the dependency 
graph structure and the basics of tree decompositions. Let X : {#1,...,2n} 
be a set of system variables and x : X + R represent a valuation to these 
system variables. Let D be the domain of all valuations of X, that describes 
the state space of the system. For convenience let x; denote x(x;). Also, let 
W : {w1,..., Wm} represent disturbance variables and w : W +> R represent a 
vector of m > 0 external disturbance inputs that take values in some compact 
disturbance space W. 


Definition 1 (Dynamical Model). A model I is a tuple (X,W,D,W, fÔ, 
Xo,U), wherein X,W,D,W are as defined above, f is an arithmetic expression 
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over variables in X,W describing the dynamics, Xo is a set of possible initial 
valuations (states) and U is a designated set of unsafe states. 

The dynamics are given by x(t + 1) = eval(f,x,w), wherein eval evaluates 
a given an expression f, a set of valuations to the system variables x € D and 
disturbances w E€ W, and returns a new set of valuations for each variable in X, 
denoted by x(t + 1). 


For simplicity, we write f(x,w) to denote eval(f,x,w) for a function expres- 
sion f. A state of the system is a valuation x : X + R such that x € D. 
Given a finite sequence of disturbance inputs w(0),...,w(Z), for some T > 0 
and w(i) € W for all i € [0,7], an execution of the system is a sequence of 
states x(0),...,x(T + 1), such that x(0) € Xo, x(t) € D for t € [0,7 + 1] and 
x(t+1) = f(x(t), w(t)) for all t € [0, T]. According to these semantics, the system 
may fail to have an execution for a given disturbance sequence w(t), t € [0,7] 
and initial state x(0) if for some state x(t), we have f(x(t), w(t)) g D. 

A state x(t) is reachable (at time t) if there is an execution of the form 
x(0),...,x(t), satisfying the constraints above. We say that the unsafe state U 
is reachable iff some state x € U is reachable. Furthermore, we say that U is 
reachable within a finite time horizon T, iff some state x € U is reachable at 
time t € [0, T]. 


Example 1. Consider a nonlinear example of a dynamical model JI with state 
space X : (%1,%2,%3) and w : (w1). The dynamics can be written as parallel 
assignments to the state variables: 


zı := 140.2522 — 0.0521 sin(x2), vq := Foti, £3 := £3 — 0.20322, 


The assignments are all evaluated in parallel to update the current state x(t) 
to a new state x(t + 1). The domain D is x; € [—3,3] for i = 1,2,3 and the 
disturbance wı € [—0.1,0.1]. The initial set Xo is zı € [—0.2,0.2] A z2 € 
[-0.3,0] A 2x3 € [0,0.4]. 


We will now define the dependency (hyper)graph of the system IT. For con- 
venience, we write the update function (expression) f of a system I in terms 
of individual updates (fi,..., fn), wherein z} = f;(x,w). We say that system 
variable x; (or disturbance variable w;) is a proper input to the expression fp if 
x; (or wj) occurs as a subterm in fp. Let inps(f,) denote the set of all proper 
input variables to the function (expression) fp. 

As an example, consider X = {21,...,24} and W = {w1, w2} and the expres- 
sion f : £1%4—wy. The proper inputs to f are {x£1, 74, w1}. We exclude cases such 
as g : sn tai) + cos ta) that has {21,22} as proper inputs. However a simplifica- 
tion using elementary trigonometric rules can eliminate them. We will assume 
that all expressions are simplified to involve the least number of variables. 


Definition 2 (Dependency Hypergraph). A dependency hypergraph of a 
system II has vertices V : X UW, given by the union of the system and 
disturbance variables with hyperedge set E C 2” given by E = {e,...,€n}, 
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wherein for each update x, := fe(x,w) (k = 1,...,n), we have the hyperedge 
ek : {£k} U inps( fk). In other words, each update x, := fk(x,w) yields an edge 
that includes x, along with all the system/disturbance variables that are proper 
inputs to fr. 


Example 2. The dependency hypergraph for the system from Example 1 has 
the vertices V : {x1, £2, £3, wi} and the edges {e1 : {a1, 22}, e2: {x£2, W1} and 
e3: {x2,x3}}. 


2.1 Tree Decomposition 


We will now discuss tree decompositions and the associated concept of treewidth 
of a hypergraph G : (V, E). The tree decomposition will be applied to the depen- 
dency hypergraphs (Definition 2) for systems IT (Definition 1). 


Definition 3 (Tree Decomposition and Treewidth). Given a hypergraph 
G : (V, E), a tree decomposition is a tree T : (N,C) and a mapping VERTS : N +> 
2V, wherein N is the set of tree nodes, C is the set of tree edges and VERTS(-) 
associates each node u E€ N with a set of graph vertices VERTS(n) C V. The tree 
decomposition satisfies the following conditions: 


1. For verter v E€ V there exists (at least one) n E€ N such that v € VERTS(n). 

2. For each hyperedge e € E there exists (at least one) n € N: e C VERTS(n). 

3. For each vertex v, for any two nodes nı, no such that v E€ VERTS(n1) and 
v E€ VERTS(n2), then v € VERTS(n) for each node n along the unique path 
between nı and nz in the tree. Stated another way, the subset of nodes N, : 
{n E€ N | v E€ VERTS(n)} induces a subtree of T (denoted T, ). 


The width of a tree decomposition is given by max{|VERTS(n)| | n € N}—1. 
In other words, we find the node n in the tree whose associated set of vertices has 
the largest cardinality. We subtract one from this maximal cardinality to obtain 
the treewidth. A tree decomposition is optimal for a graph G if no other tree 
decomposition exists with a strictly smaller width. The treewidth of a hypergraph 
G is given by width of an optimal tree decomposition. 


It is easy to show that if the graph G is a tree, it has treewidth 1. Likewise, 
a cycle has tree width 2. 


Example 3. The tree decomposition of the hypergraph G from Example 2 has 
three nodes {n1,n2,n3} with edges (n1, n2) and (n2,n3). The nodes along with 
the associated vertex sets are as follows: 


[ns : {x2,wi} {ra : {r2, £3} — nz : {z1, £2} | 


Although the tree decomposition is not a rooted tree, we often designate an 
arbitrary node r € N as the root node, and consider the tree T as a rooted tree 
with root r. 
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Finding a Tree Decomposition: Interestingly, the problem of finding the 
treewidth of a graph is itself a NP-hard problem. However, many practical 
approaches exist for graphs with small treewidths. For instance, Bodlaender 
presents an algorithm that runs in time O(k?(*")) to construct a tree decompo- 
sition of width at most k or conclude that the treewidth of the graph is at least 
k +1 [6]. Such an approach can be quite useful if a given graph is suspected to 
have a small tree width in the first place. Besides this, many efficient algorithms 
exist to approximate the treewidth of a graph to some constant factor. A detailed 
survey of these results is available elsewhere [7,8]. Open-source packages such 
as HTD can compute treewidth for graphs with thousands of nodes [1]. Finally, 
we note that if a tree decomposition of width k can be found, then one can be 
found with at most |V| nodes. 


Lemma 1. Let T be a tree decomposition for a (multi) graph G with vertices V 
and treewidth k. There exists a tree decomposition T of G with the same treewidth 
k, and at most |V| nodes. 


A proof is provided in the extended version of the paper. 


3 Abstract Domains Using Tree Decompositions 


In this section, we will define abstract domains using tree decompositions of 
the dependency hypergraph of the system under analysis. Let IT be a transition 
system over system variables X. The concrete states are given by x € D, wherein 
x: X +> R maps each state variable x; € X to its value x(z,;) (denoted xj). 


Definition 4 (Projections). The projection of a state x to a subset of state 
variables J C X, denoted as proj(x, J), is a valuation x : J + R such that 
R(xi) = x(x) for all x; € J. For a set of states S C D and a subset of state 
variables J C X, we denote the projection of S along (the dimensions of) J as 
proj(S, J) : {proj(x, J) | x € S}. 


Definition 5 (Extensions). Let R be a set of states involving just the variables 
in the set Jı C X, i.e, RC proj(D, J1). We define the extension of R into a set 
of variables J2 D Jı as exty,(R): {x € proj(D, J2) | proj(x, J1) € R}. 

In other words, the extension of a set embeds each element in the larger 
dimensional space defined by Jz allowing “all possible values” for the dimensions 
in Ja \ J. 


We will use the notation ext( S) to denote the set extx(S), i.e, its extension 
to the entire set of state variables X. For a state xs, we will use ext(xs) denote 


ext({xg}). 


Definition 6 (Product (Join) of Sets). Let Ry C proj(D, Jı) and Rə C 
proj(D, Jz). We define Ry ® Ro : {x : UJ > R | proj(x, Jı) € 
Rı and proj(x, J2) € Ro}. 
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Let T : (N,C) be a tree decomposition of the dependency hypergraph of the 
system. Recall that for each node n € N we associate a set of system/disturbance 
variables denoted by VERTS(n). Let VERTSx(n) denote the set of system vari- 
ables: VERTS(n)X. We say that an update function x, := f(x, w) is associated 
with a node n in the tree iff {xp} U inps( fk) C VERTS(n). 


Lemma 2. For every system variable xz, its update £k := fk(x, w) is associ- 
ated with at least one node n€ N. 


Proof. This follows from those of a tree decomposition that states that every 
hyperedge in the dependency hypergraph must belong to VERTS(n) for at least 
one noden E N. 


3.1 Abstraction and Concretization 


We consider subsets of the concrete states for the system M, i.e, the set 2P, 
ordered by set inclusion as our concrete domain. Given a tree decomposition, 
T, we define an abstract domain through projection of a concrete set along 
VERTS(n) for each node n of T. 


Definition 7 (Abstract Domain). Each element s of the abstract domain 
Ar is a mapping that associates each node n € N with a set s(n) C 
proj( D, VERTSx(n)). 

For s1,s2 € Ar, sı E s2 iff sı(n) C se(n) for eachn Ee N. 


We will use the notation proj(S,n) for a node n € N to denote 
proj( S, VERTS x(n)). 


Definition 8 (Abstraction Map). Given a tree decomposition T, the abstrac- 
tion map ar takes a set of states S C D and produces a mapping that associates 
tree node n € N to a projection of S along the variables VERTSx(n). Formally, 


ar(S): An: N. proj(S,n). 


Thus, an abstract state s is a map that associates each node n of the tree to 
a set s(n) C Dn. We now define the concretization map yr. 


Definition 9 (Concretization Map). The concretization yr(s) of an 
abstract state is defined as yr(s) : Mnen ext(s(n)). In other words, we take 
s(n) for every node n € N, extend it to the full dimensional space of all system 
variables and intersect the result over all nodes n € N. 


Example 4. Consider a simple tree decomposition T with 2 nodes n1, no and a 
single edge (n1, n2). Let VERTS(n1) : {%1, £2} and VERTS(n2) : {%2, 23}. Let the 


T2 T3 


domain D be the set x; € {1,2,3} for i = 1,2,3. We use the notation (v1, U2, U3) 
to denote a state x that maps x, to the value v1, £2 to the value v2 and so on. 
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Now consider the set S = {(1,1, D, (1, 1,2), (1,2, 3)}. We have that s : a(S) 
is the mapping that projects S onto the dimensions (21,22) for node nı and 
(£2, £3) for node ng: 


Ly £2 


nı => 1, is tL, 2)}; n2 b> car D, (1, 2); È, 3)} s 
Likewise, we verify that the concretization map y(s) will yields us: 
q(s) : {(1,1,1), (1,1, 2), (1,2,3)}. 
For convenience, if the tree T is clear from the context, we will drop the 


subscripts to simply write a and y for the abstraction and concretization map, 
respectively. 


Theorem 1. For any tree decomposition T, the maps a and y form a Galois 
connection. ILe, for all S C D ands € Ar: a(S) E s iff SC 7s). 


Proof. Let S,s be such that a(S) E s. Therefore, proj(S, n) C s(n) Yn € N by 
the definition of E. Pick any, x € S. First, proj(x,n) € proj(S,n) and therefore, 
proj(x,n) € s(n) for all n € N. Thus, x € ext(s(m)) for each node n € N. 
Therefore, x € pen ext(s(n)), and hence, x € (s), by defn. of y. Therefore, 
S C q(8). 

Conversely, assume S C q(s). Since 7(s) = Mnen ext(s(n)) (from Defini- 
tion 9). Therefore, S C ext(s(n)) forall n € N. Therefore, for all x € S, 
proj(x,n) € s(n). Therefore, proj(S,n) C s(n) for every n € N. Finally, this 
yields a(S) E s. 

The meet operation is defined as sı M s2 : An. s(n) N s(n), and likewise, 
the join is defined as sı U s2 : An. sı(n) U s2(n). We recall two key facts that 
follow from Galois connection between a and y. 


1. For any set S C D, we have S C q(a(S)). Abstracting a concrete set and 
concretizing it back again “loses information”. To see why, we start from 
a(S) E a(S) and apply the Galois connection to derive S C 7(a(S)). 

2. Likewise, for any abstract domain object s € A, we have a(y(s)) E s. Le, for 
any element s, taking its concretization and abstracting it “gains informa- 
tion”. To prove this, we start from y(s) C 7(s) and conclude that a(q(s)) E s. 


Example 5. Returning back to Example 4, now consider the set 


Tı 2 T3 


§ = {0,1,2 (1,2,3), @, 1,2), (2,2,4}. 


Its abstraction ŝ : a($) is given by the mapping: 


Tı T2 2 T2 T3 


ni + {(1, 1), 4,2), 2,1), (2,2)}, no {C,2), 2,3), 2,4}. 


r T3 


Notice that (2, 3, 3) and (1,2, 4) are part of (8) 


but not the original set Similarly, consider the abstract element sı: nı => 


T 23 


{(1,1),(1,2)}, n2 > {(1,3)}. We note that 7(s1) : {(1,1,3)} and therefore 
a(7(s1)) yields the abstract element sz E s1: nı œ> a, D}, nz > {(1, 3)}. 
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3.2 Canonical Elements and Message Passing 


In the tree decomposition, various nodes share information about the subsets of 
vertices associated with each node. Since the subsets have elements in common, 
it is possible that a node nı has information about a variable xə that is also 
present in some other node nz of the tree. We will now see how to take an abstract 
element s and refine each s(n) by exchanging information between nodes in a 
systematic manner. 

For each edge (n1,n2) E€ C of the tree, define the set of variables in com- 
mon as CV(n1, n2): VERTS(n1) N VERTS(n2) and CV x(n1,n2): VERTS% (n1) N 
VERTSx (n2). 


Definition 10 (Canonical Elements). An abstract element s is said to be 
canonical if and only if for each edge (n1,n2) E€ C in the tree: 


proj(s(n1), CV x (n1, n2)) = proj(s(n2), CV x (m1, n2)). 


In other words, if we took the common variables VERTSx(n1)MVERTSx (n2), the 
set s(n,) projected along these common variables is equal to the projection of 
s(n2) along the common variables. 


Example 6. Consider the abstract element sı from Example 5: ny iw 
{(1,1),(1,2)}, no + {(1,3)}. proj(si(n1), CV (n1, n2)) is the set {1,2} whereas 
proj(si(m2), CV (n1, n2)) is simply {i}. Therefore, sı fails to be canonical. 


The key theorem of tree decomposition is that a canonical element in 
the abstract domain can be seen as the projection of a concrete set S along 
VERTS x(n) for each node n of the tree. To prove that we will first establish a 
useful property of a canonical element s. 


Lemma 3. For every canonical element s € A, node n E N and element Xn € 
s(n), we have that ext(xn)7(s) £ 0. 


Stated another way, the lemma claims that for any canonical s, any x, € s(n) can 
be extended to form some element of 7(s). A proof is provided in the extended 
version. 


Theorem 2. An element s is canonical (Definition 10) if and only if s = a(S) 
for some concrete set S. 


Ideally, in abstract interpretation, we would like to work with abstract 
domain objects that satisfy s = a(y(s)). One way to ensure that is to take 
any given domain element so and simply calculate out a(y(so)) by applying the 
maps. However, y(s9) in our domain takes lower dimensional projections and 
reconstructs a set in the full states pace. It may thus be too expensive to com- 
pute. Fortunately, canonical objects satisfy the equality s = a(7(s)). Therefore, 
given any object s € A that is not necessarily canonical, we would like to make 
it canonical: I.e, we seek an object § such that +(8) = y(s), but § is canonical. As 
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mentioned earlier, directly computing § = a(y(s)) can be prohibitively expen- 
sive, depending on the domain. We now describe a message passing approach. 

First, we convert the tree T to a rooted tree by designating an arbitrary node 
r € N as the root of the tree. 


Message Passing along Edges: Let (n1,n2) be an edge of the tree and s be 
an abstract element. A message from nı to ng is defined as the set msg(s, nı > 
ng): proj(s(n1), CV (n1, n2)). In other words, we project the set s(n) along the 
dimensions that are common to (1,172). 

Once a node ng receives M : msg(s,nı — n2), it processes the message by 
updating s(nz) as s(n2) := s(n2) N extverrs(na)( M). In other words, it intersects 
the message (extended to the dimensions in nz) with the current set that is 
associated with no. 


Example 7. Consider a tree decomposition with three nodes {n1, n2, n3} and the 
edges (n1, n2) and (n2, ng). Let VERTS(n1) : {£1, £2}, VERTS(n2) : {£2, z4} and 
VERTS(n3) : {22,23}. Let D be the domain {1,2,3,4}*. Consider the abstract 
element s: 


1 2 z2 T4 aq T4 tq T4 z2 T4 


LD} no {(1, 1), 2,2), 3,3), 44}, ns = (4,9), (2, 


z3 


mt {(1,2), (3,3), ( ji: 

A message msg(s, nı — n2) is given by the set proj(s(n1), {z2} : (3, 3,4 4}. 
This results in the new abstract object s’ wherein the element (1, 1) is removed 
from s(n2): 


1 2 z2 g aq T4 tq T4 z2 T4 


,3)}, n2 = {((15), (2,2), 6,3), (4,4)}, ns {(4,4), È, 3)}. 


Upwards Message Passing: The upwards message passing works from leaves 
up to the root of the tree according to the following two rules: 


1. First, each leaf of the tree n passes a message to its parent np. The parent 
node n, intersects its current value s(n,) with the message to update its 
current set. 

2. After a node has received (and processed) a message from all its children, it 
passes a message up to its parent, if one exists. 


The upwards message passing terminates at the root since it does not have 
a parent to send a message to. 


Example 8. Going back to Example 7, we designate ng as the root and the 
upwards pass sends the messages msg(s,n1 — n2) and msg(s,n3 — n2). This 
results in the following updated element: 


mi {(1,2), (3,3), (1, 4)}, mo (i), 6, 2,2 5), 8%), (4,4)}, ns {(4,4), (2,3)}. 
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Downwards Message Passing: The downwards message passing works from 
the root down to the leaves. 


1. To initialize, the root sends a message to all its children. 
2. After a node has received (and processed) a message from its parent, it sends 
a message to all its children. 


The overall procedure to make a given abstract object s canonical is as fol- 
lows: (a) perform an upwards message passing phase and (b) perform a down- 
wards message passing phase. 


Example 9. Going back to Example 8, the downward message passing phase 
sends messages from n —> nı and n2 — ng. The resulting element § is 


g w g z2 T4 z2 g z2 T4 wo £4 


m = {(1,2), (33), (1,4)}, n = (40, (2, 2), (33), (4,4)}, ns > {(4, 4), (2,3)}. 


On the other hand, it is important to perform message passing upwards first and 
then downwards second. Reversing this does not yield a canonical element. For 
instance going back to Example 7, if we first performed a downwards pass from 
ng, the result is unchanged: 


z2 T4 r2 £3 T2 T3 


nı > {(1,2), 8,3), 0,5}, n2 > {1 D, 2,2), 6,3), 4,4}, na > (4,4), Ê, 3)}. 


Performing an upwards pass now yields the element sg: 


m= {85,6 4D} mo AED EED), no (49,69). 


Ns 


However this is not canonical, since the element (3, 3) in 52(n1) violates the 
requirement over the edge (n1, 2). 


Let § be the resulting abstract object after the message passing procedure 
finishes. 


Theorem 3. The result of message passing § is a canonical object, and it sat- 
isfies (8) = q(s). 


Proof (Sketch). First, we note that whenever a message is passed for an abstract 
value s from node m to n along an edge (m, n) resulting in a new abstract value 
s': (P1) 7(s') = 7(s); and (P2) the projection of s’(n) along the dimensions 
CV(m, n) is now contained in that of s’(m) along CV (m, n). Furthermore, prop- 
erty (P2) remains unchanged regardless of any future messages that are passed 
along the tree edges. 

Next, it is shown that after each upwards pass, when a message is passed, 
property (P2) (stated above) holds for each node m and its parent node n since 
a message is passed from m to n. During the downwards pass, property (P2) 
holds for each node n and its child node m in the tree. Combining the two, 
we note that for each edge (m,n) in the tree, we have property (P2) in either 
direction guaranteeing that proj(s*(m),CV(m,n)) = proj(s*(n), CV(m, n)), for 
the final result s*, or in other words that s* is canonical. 
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3.3 Decomposable Sets and Post-conditions 


We have already noted that for any concrete set over S C D, the process of 
abstracting it by projecting into nodes of a tree T, and re-concretizing it is 
“lossy”: I.e, S C y(a(S)). In this section, we study “tree decomposable” concrete 
sets S for which y(a(S)) = S. Ideally, we would like to prove that if a set S is 
tree decomposable then so is the set post(S, IT) of next states. However, we will 
disprove this by showing a counterexample. Nevertheless, we will present an 
analysis of why this fact fails and suggest approaches that can “manage” this 
loss in precision. 


Definition 11 (Decomposable Sets). We say that a set S is tree decompos- 
able given a tree T iff y(a(S)) = S. 


This is in fact a “global” definition of decomposability. In fact, a nice “local” 
definition can be provided that is reminiscent of the notion of conditional inde- 
pendence in graphical models. We will defer this discussion to an extended ver- 
sion of this paper due to space limitations. 


Example 10. Consider set S : {(1, 2, D, (2, 2, 2)} and tree T below: 
[ni : {x1,22} {nz : {x2,23}| 


We wish to check if S is T-decomposable. We have s : a(S) as 


T2 T3 


s(n1) : proj(S, m) : {(1,2), (2,2)} s(n2) : proj(S, nÈ, D, (2,2)}. 


Now, 7(s):{(1,2, 1), (1,2,2), (2,2, 1), (2,2,2).}. We note that the set S is 
not tree decomposable. On the other hand, one can verify that the set 
Si{(1, 2, 2), (2, 2, 2)} is tree decomposable. 


The following lemma will be quite useful. 


Lemma 4. Let S1, S2 be tree decomposable sets over T. Their intersection is 
tree decomposable. 


Let I be a transition system over system variables in x € D. For a given set 
S C D, us define the post-condition post(S, IZ) to be the set of states reachable 
in one step starting from some state in S: 


post(S, IT) : {x | x € S, x’ =eval(f,x)}. 
Let us also consider a transition relation R over pairs of states (x, x’) € D@D: 
R= {(x,x’) | x,x’ € D and x’ = eval(f,x)}. 


The relation R can be viewed as the intersection of n relations: R: A 
wherein 


xjEX Rj, 


R; : {(x,x’) | x,x' € D and x} = eval(f;,x)}. 
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In other words, R; is a component of R that models the update of the system 
variable zj. Also for each x; € X, let ej : inps(f;) Ux; be the inputs to the 
update function f; and the node a, itself. 

Given the tree T, we define the extended tree T’ as having the same node 
set N and edge set C as T. However, VERTS7/(n) = VERTS7(n) U {x} |x; € 
VERTS7(n)}. Note that T’ with the labeling VERTS7” satisfies all the condition 
of a tree decomposition for a graph G save the addition of vertices x; in each 
node of the tree. We will write VERTS’(n) to denote the set VERTS7’ (n). 


Lemma 5. The transition relation R of a system II is tree T’ decomposable. 


The proof is provided in the extended version and is done by writing R as an 
intersection of tree decomposable relations R;, and appealing to Lemma 4. 

First, we show the negative result that the image of a tree (T) decomposable 
set under a tree (T’) decomposable transition relation is not tree decomposable, 
in general. 


Example 11. Let X = {x1,22,23} and consider again the tree decomposition 


from Example 10. Let S be the set { (x, *, x)t, wherein we use the wild card 
character as notation that can be substituted for any element in the set {1, 2}. 
Therefore, we take S to be a set with 8 elements. Clearly S is tree decomposable 
in the tree T from Example 10. 

Consider the transition relation R that will be written as the intersection of 
three transition relations: 


Rı: {(X, X’) | x1 = z2}, Ra : {(X, X’) | z3 € {1,2}}, R3: {(X, X’) | z3 = £2}. 


Clearly R is tree T’ decomposable. We can now compute the post-condition 
of S under this relation. The reader can verify the post-condition S 


{(1, x, 1), (2, x, 2)}. However, $' is not tree decomposable. We note that §: a(S) 


ay T2 x v3 


is the set ŝ(n1) : {(*, *)} and 8(n2) : {(*,*)}. Therefore (8) is the set { (x, x, *)}. 

As noted above, the set R is tree T’ decomposable. If S is tree decomposable, 
we can extend S to a set S$”: extx: (S) that is now defined over X U X’ and is 
also tree decomposable. As a result S’M R is also tree decomposable. However, 
the postcondition of S is the set proj( S” N R, X’). Thus, the key operation that 
failed was the projection operation involved in computing the post-condition. 
This suggests a possible solution to this issue albeit an expensive one: at each 
step, we maintain the reachable states using both current and next state vari- 
ables, thus avoiding projection. In effect, the reachable states at the it” step will 
be entire trajectories of the system expressed over variables Xo U X1 U--- Xj. 
This is clearly not practical. However, a more efficient solution is to note that 
some of the current state variables can be projected out without losing the 
tree decomposability property. Going back to Example 11, we note that we can 
safely project away {21,23}, while maintaining the new reachable set in terms 
of (%2, £1, £2, £3). In this way, we may recover the lost precision back. 
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In conclusion, we note that tree decompositions may lose precision over post- 
conditions. However, the loss in precision can be avoided if carefully selected 
“previous state variables” are maintained as the computation proceeds. The 
question of how to optimally maintain this information will be investigated in 
the future. 


4 Grid-Based Interval Analysis 


We now combine the ideas to create a disjunctive interval analysis using tree 
decompositions. The main idea here is to apply tree decompositions not to the 
concrete set of states but to an abstraction of the concrete domain by grid-based 
intervals. 

We will now describe the interval-based abstraction of sets of states dynam- 
ical system JT in order to perform over-approximate reachability analysis. Let 
us fix a system IT : (x,w,D,W,f,Xo,U) as defined in Definition 1. We will 
assume that the domain of state variables D is a hyper-rectangle given by 
D : [L£(a1),U(«1)] x--+ x [L(an), U (£n)] for L(x;),U(a;) E€ Rand L(z;) < U(a;) 
for each j = 1,...,n. In other words, each system variable x; lies inside the inter- 
val [L(x;),U(2;)]. Likewise, we will assume that W : []7_,[L(wx), U(wx)] such 
that L(w,) < U (wp) and L(wg), U (wg) € R. 

We will consider a uniform cell decomposition wherein each dimension 
is divided into some natural number M >0 of equal sized subintervals. The 
i? subinterval of variable x; is denoted as sublnt(x;,i), and is given by 
[L(xj) + 16), L(y) + (i+ 1)8;] for i = 0,..., M — 1 and 6, : C@V—F@D) | gin 
ilarly, we will define sublnt(w;,,7) for disturbance variables wą whose domains 
are also divided into M subdivisions. The overall domain D x W is therefore 
divided into M™*™” cells wherein each cell is indexed by a tuple of natural num- 
bers i: (41,..-,%n,in41,---;%n+m), Such that i; € {0,..., M — 1} and the cell 
corresponding to i is given by: 


n m 


yeli): [[ subint(a;,i;) x Į subint(wz,in+x) (1) 


j=l k=1 


Definition 12 (Grid-Based Abstract Domain). The grid based abstract 
domain is defined by the set C : P(i € {0,..., M}™*”), wherein each abstract 
domain element is a set of grid cells. The sets are ordered simply by set inclusion 
C between sets of grid cells. The abstraction map ac : P(D) — C is defined as 
follows: 


ac(S): {i EC | yG)NS AO}. 
The concretization map yc is defined above in (1). 
Definition 13 (Interval Propagator). An interval propagator (IP) is a 


higher order function that takes in the description of a function f with k real- 
valued inputs and p real valued outputs, and an interval I : |li, u1] x --- x [lk, ug] 
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and outputs an interval (hyperrectangle over R?) INTVLPROP(f,I) such that the 
following soundness guarantees hold: 


k 
(Yx € D) ) Nx uj] = eval(f,x) € INTVLPROP(f, I). 


In practice, interval arithmetic approaches have been used to build sound 
interval propagators [33]. However, they suffer from issues such as the wrapping 
effect that make their outputs too conservative. This can be remedied by either 
(a) performing a finer subdivision of the inputs (i.e, increasing M) to ensure 
that the intervals J being input into the INTVLPROP are sufficiently small to 
guarantee tight error bounds; or (b) using higher order arithmetics such as affine 
arithmetic or Taylor polynomial arithmetic [25,32]. 

The interval propagator serves to define an abstract post-condition operation 
over sets of cells Ê C C. Given such a set, $, we compute the post condition in 
the abstract domain. Informally, the post condition is given (a) by iterating over 
each cell in S; and (b) computing the possible next cells using INTVLPROP. 
Formally, we define the abstract post operation as follows: 


post,,(S, IT) U ac(INTVLPROP(f, yc (i))). 
ic 


Given this machinery, an abstract T-step reachability analysis is performed 
in the standard manner: (a) abstract the initial state; (b) compute post condi- 
tion for T steps; and (c) check for intersections of the abstract states with the 
abstraction of the unsafe set. We can also define and use widening operators to 
make the sequence of iterates converge. The grid based abstract domain can offer 
some guarantees with respect to the quality of the abstraction. For instance, we 
can easily bound the Hausdorff distance between the underlying concrete set 
and the abstraction as a function of the discretization sizes ĝj. However, the 
desirable properties come at a high computational cost since the number of cells 
grows exponentially in the number of system and disturbance variables. 


4.1 Tree Decomposed Analysis 


We now consider a tree-decomposed approach based on the concept of nodal 
abstractions. The key idea here is to perform the grid-based abstraction not on 
the full set of system and disturbance variables, but instead on individual nodal 
abstractions over a tree decomposition T. 


Definition 14 (Nodal Abstractions). A nodal abstraction NODAL 
ABSTRACTION(IT,n) corresponding to a node n € N is defined as follows 


1. The set of system variables are given by Xn : VERTSx(n) with domain given 
by Dn : proj(D, Xn). 
2. The initial states are given by proj(Xo, Xn). 
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ve 


The unsafe set is given by proj(U, Xn). 
4. The set of disturbance variables are Yn : VERTSw(n) with domain given by 
Wry: proj(W, Wn). 

5. The updates are described by a relation R(X, X/,) that relate the possible 

current states Xn and next states X}. The relation is constructed as a con- 

junction of assertions over variables xi, x! wherein zi € Xn. 

(a) If the update x; := f;(x,w) is associated with the node n, we add the con- 
junct z; = fi(Xn,W,), noting that the proper inputs to fi are contained 
in VERTS(n). 

(b) Otherwise, x, € proj(D,{x:}) that simply states that the next state value 
of the variable x; is some value in its domain. 


Given a system JT, the nodal abstraction is a conservative abstraction, and 
therefore, it preserves reachability properties. 


Lemma 6. For any reachable state x of IT at time t, its projection proj(x, Xn) 
is a reachable state of NODALABSTRACTION(II,n) at time t. 


Since each nodal abstraction involves at most w+1 variables, the abstraction 
at each node can involve at most M“++ cells where w is the tree width. Also, 
note that a tree decomposition can be found with tree width w that has at most 
|X| + |W] nodes. This implies that the number of nodal abstractions can be 
bounded by (|X| + |W). 

Let (mn) : NODALABSTRACTION(JI,n) be the nodal abstraction for tree 
node n € N. For each node n € N, we instantiate a grid based abstract domain 
for IT(n) ranging over the variables VERTSx (n). At the it” step of the reachability 
analysis, we maintain a map s; each node n to a set of grid cells s;(n) defined 
over VERTS(7n). 


1. Compute ;(n) : posto(s;(n), H(n)). 
2. Make s; canonical using message passing between nodes to obtain 5,41. 


The message passing is performed not over projections of concrete states but 
over cells belonging to the grid based abstract domain. Nevertheless, we can 
easily extend the soundness guarantees in Theorem 3 to conclude soundness of 
the composition. 

Once again, we can stop this process after T steps or use widening to force 
convergence. We now remark on a few technicalities that arise due to the way 
the tree decomposition is constructed. 


Intersections with Unsafe Sets: Checking for a non-empty intersection with 
the unsafe sets may require constructing concrete cells over the full dimensional 
space if the unsafe sets are not tree decomposable for the tree T. However in 
many cases, the unsafe states are specified as intervals over individual variables, 
which yields a tree decomposable set. In such cases, we need to intersect the 
abstraction at each node with the unsafe set and perform message passing to 
make it canonical before checking for emptiness. 
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Handling Guards and Invariants: We have not discussed guards and invari- 
ants. It is assumed that such guards and invariants are tree decomposable over 
the tree T. In this case, we can check which abstract cells have a non-empty 
intersection with the guard using message passing. The handling of transition 
systems with guards and invariants will be discussed as part of future extensions. 


5 Experimental Evaluation 


In this section, we describe an experimental evaluation of our approach over 
a set of benchmark problems. Our evaluation is based on a C++ -based proto- 
type implementation that can read in the description of a nonlinear dynamical 
system over a set of system and disturbance variables. The dynamics can cur- 
rently include polynomials, rational functions and trigonometric functions. Our 
implementation uses the MPFI library to perform interval arithmetic over the 
grid cells [36]. We use the HTD library to compute tree decompositions [1]. The 
system then computes a time-bounded reachable set over the first T steps of the 
system’s execution. Currently, we plot the results and compare the reachable 
set estimates against simulation data. We also compare the reachable sets com- 
puted by the tree decomposition approach against an approach without using 
tree decompositions. However, we note that the latter approach timed out on 
systems beyond 4 state variables. 

Table 1 presents the results over a small set of challenging nonlinear systems 
benchmarks along with a comparison to two other approaches (a) the approach 
without tree decomposition and (b) the tool SAPO [22] which computes time 
bounded reachable sets for polynomial systems using the technique of parallelo- 
tope bundles described by Dreossi et al. [23]. The benchmarks range in number 
of system variables from 3 to 20 state variables. We describe the sources for 
each benchmark where appropriate. Note that the SAPO tool does not handle 
nonpolynomial dynamics or time varying disturbances at the time of writing. 

The treewidths range from 1 for the simplest system (Example 1) to 3 for the 
7-state Laub Loomis oscillator example [30]. We note that the tree decomposition 
was constructed within 0.01s for all the examples. We also note that systems 
with as many as 20 state variables are handled by our approach whereas the 
monolithic approach cannot handle systems beyond 4 state variables. We now 
compare the results of our approach to that of the monolithic approach on the 
two cases where the latter approach completed. 


System # 1: Consider again the system from Example 1 with 3 state variables 
and 1 disturbance. We have already noted a tree decomposition of tree width 1 
for this example. 


System # 2: In this example, we consider a system over 4 state variables 
{x,y,z,w} and one disturbance variable w1. 


x := 0.5x + y +0.05xry — w, y := —0.7y—0.082, z := z — 0.4y, 
w := w -— 0.05rw 
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Table 1. Results on benchmark examples. |X|: Number of state variables, |W |: number 
of disturbance variables, Tree Decomp.: reachability using tree decompositions, Mono- 
lithic: reachability analysis without tree decompositions. SAPO: number of directions 
(|Z|), number of bundles (|T|) and running time. All timings are reported in seconds 
on a Macbook pro laptop running MacOS 10.14 with 16GB RAM and 3.4GHz Intel 
core i7 processor. Reachability analysis was carried out for 15 time steps. 


Name |X|||W||Tree |Tree Decomp. | Monolithic SAPO 

Width 

Time |# Cells|Time # Cells (|L|,|T|) Time 

System # 1 3 |1 1 14.4 |0.22M | 1047.6 |7.6M -n/a- 
System # 2 4 |1 |2 2T 24K 652 3.1M__|-n/a- 
SIR [23,40] 3 |0 1 4.1 95K 143 2M 3,1) 0.1 
1D-Lattice-10 [39] 10 |O 2 99 1.1M TO (1.5h 16,6) 679 
Ebola-epidemic [14] 5 |O |2 799.4 |1.9M /|TO (1.5h 5,5) 0.02 
p53-gene-reg [31] 6 0 |2 135.8 |98K TO (1.5h -n/a- 
Influenza-epidemic [22]| 4 |0 |2 517.9 |1.4M__|TO (1.5h 7,4) 0.1 
Coupled-vanderpol 6 0 |2 10.5 |0.1M |TO (1.5h 10,5) 2.5 
Laub-Loomis [20,30] 7 |O 3 1755.1 | 2.6M TO (1.5h 12,6) 1.8 
Honeybee* [9,23] 6 la l3 206.1 |2.1M | TO (1.5h 8,4) 10.7 
Phosporelay [22] 70 |3 1566.2|7.5M__|TO (1.5h 10,4) 1.2 
Coord. Vehicles (1) 5 ]1 l2 150.2 |0.5M |/TO (1.5h -n/a- 
Coord. Vehicles (2) 10 |2 |2 1175.2| 2M TO (1.5h -n/a- 
Coord. Vehicles (4) 20 |4 |2 2206.7|3.9M__|TO (1.5h -n/a- 


The domains include (x,y,z,w) € [-1,1]* and divided into 16 x 108 grid 
cells (200 for each state variable). The disturbance wı € [—0.1,0.1]. The ini- 
tial conditions are x € [0.08,0.16],y € [—0.16,—.05],z € [0.12,0,31] and 
w € [—0.15,—0.1]. We obtain a tree decomposition of width 2, wherein the 
nodes include nı : {£, y, wi}, no: {y,z} and ng: {x,w} with the edges (n1, n2) 
and (ni, ng). 

Figure 1 compares the resulting reachable sets for the tree decomposed reach- 
ability analysis versus the monolithic approach. We note differences between the 
two reachable sets but the loss in precision is not significant. 


Coordinated Vehicles: In this example, we study nonlinear vehicle models of 
vehicles executing coordinated turns. Each vehicle has states (£4, Yi, Ux,i, Vy, iw), 
representing positions, velocities and the rate of change in the yaw angle, respec- 
tively, with a disturbance w;. The dynamics are given by 


Ti = £i +0 1vei, Yi = Yi + 0.LVy i, Uri = Ur i + 0-1vz; cos(0.1W;) 
— 0.1vy i sin(0.1w;i)wi = 0.5w; + 0.5wo + 0.1w; 


The vehicles are loosely coupled with w; representing the turn rate of the 
it” vehicle and wo that of the “lead” vehicle. The i*” vehicle tries to gradually 
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Fig. 1. Reachable set projections (shaded blue) for System# 2 (left) and the SIR 
model [22] (right). Top: tree decomposition approach and Bottom: monolithic approach 
without tree decompositions. Reachable sets are identical for the SIR model. Note the 
difference in range of z for the system #2. The red dots show the results of simulations. 
(Color figure online) 


align its turn rate to that of the lead vehicle. This model represents a simple 
scenario of loosely coupled systems that interact using a small set of state vari- 
ables. Applications including models of cardiac cells that are also loosely coupled 
through shared action potentials [26]. The variables x;, y; are set in the domain 
[—15,15] and subdivided into 300 parts along each dimension. Similarly, the 
velocities range over [—10,10] and are subdivided into 500 parts each and the 
yaw rate ranges over [—0.2,0.2] radians/sec and subdivided into 25 parts. The 
disturbance ranges over [—0.1, 0.1]. Table 1 reports results from models involving 
1,2 and 4 vehicles. Since they are loosely coupled, the treewidth of these models 
is 2. 


Laub-Loomis Model: The Laub-Loomis model is a molecular network that 
produces spontaneous oscillations for certain values of the model parameters. 
The model’s description was taken from Dang et al. [20]. The system has 7 state 
variables each of which was subdivided into 100 cells yielding a large state space 
with 10!* cells. We note that the tree width of the graph is 3, yielding nodes 
with upto 4 variables in them. 


Comparison with SAPO. SAPO is a state-of-the-art tool that uses polytope 
bundles and Bernstein polynomials to represent and propagate reachable sets 
for polynomial dynamical systems [22,23]. We compare our approach directly on 
SAPO for identical models and initial sets. Note that SAPO does not currently 
handle non-polynomial models or models with time-varying disturbances. Table 1 
shows that SAPO is orders of magnitude faster on all the models, with the sole 
exception of the 1D-Lattice-10 model. Figure2 shows the comparison of the 
reachable sets computed by our approach (shaded blue region) against those 
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Fig. 2. Comparison of various projections of the reachable sets computed by our app- 
roach shown in blue, the reachable set computed by SAPO shown as black rectangles 
and states obtained through random simulation shown in red dots. Top row: ebola 
model, second row: phosporelay, third row: 1d-lattice-10, fourth row: vanderpol (35 
steps) and bottom row: influenza model. (Color figure online) 


== 
- ml 


0.00 0.25 0.50 0.75 1.00 
5 


computed by SAPO (black rectangles) for five different models. We note that 
for three of the models compared, neither reachable set is contained in the other. 
For the one dimensional lattice model, SAPO produces a better reachable set, 
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whereas our approach is better for the influenza model. We also note that both 
for our approach the precision can be improved markedly by increasing the 
number of subdivisions, albeit at a large computational cost that depends on 
the treewidth of the model. The same is true for SAPO, where the number of 
directions and the template sizes have a non-trivial impact on running time. 


Models with Large Treewidths. We briefly report on a few models that we 
attempted with large treewidths. For such models, our approach of decomposing 
the space into cells becomes infeasible due to the curse of dimensionality. 

A model of how honeybees select between different sites [9,23] has 6 vari- 
ables and its tree width is 5 with a single tree node containing all state vari- 
ables. However, the large treewidth is due to two terms in the model which are 
replaced by disturbance variables that overapproximate their value. This brings 
down the treewidth to 3, making it tractable for our approach. Details of this 
transformation are discussed in our extended version. Treewidth reduction using 
abstractions is an interesting topic for future work. 

We originally proposed to analyze a 2D grid lattice model taken from Vleck 
et al [39]. However, a 2D 10 x 10 lattice model has a dependency hypergraph that 
forms a 10x10 grid with treewidth 10. Likewise, the 17-state crazyflie benchmark 
for SAPO [22] could not be analyzed by our approach since its treewidth is too 
large. 


6 Conclusions 


We have shown how tree decompositions can define an abstract domain that 
projects concrete sets along the various subsets of state variables. We showed 
how message passing can be used to exchange information between these subsets. 
We analyze the completeness of our approach and show that the abstraction is 
lossy due to the projection operation. We show that for small tree width mod- 
els, a gridding-based analysis of nonlinear system can be used whereas such 
approaches are too expensive when applied in a monolithic fashion. For the 
future, we plan to study tree decompositions for abstract domains such as dis- 
junctions of polyhedra, parallelotope bundles and Taylor models. The process of 
model abstraction to reduce treewidth is another interesting future possibility. 
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Abstract. We address the problem of synthesizing a controller for non- 
linear systems with reach-avoid requirements. Our controller consists of 
a reference controller and a tracking controller which drives the actual 
trajectory to follow the reference trajectory. We identify a type of refer- 
ence trajectory such that the tracking error between the actual trajectory 
of the closed-loop system and the reference trajectory can be bounded. 
Moreover, such a bound on the tracking error is independent of the ref- 
erence trajectory. Using such bounds on the tracking error, we propose 
a method that can find a reference trajectory by solving a satisfiability 
problem over linear constraints. Our overall algorithm guarantees that 
the resulting controller can make sure every trajectory from the initial 
set of the system satisfies the given reach-avoid requirement. We also 
implement our technique in a tool FACTEST. We show that FACTEST 
can find controllers for four vehicle models (3—6 dimensional state space 
and 2—4 dimensional input space) across eight scenarios (with up to 22 
obstacles), all with running time at the sub-second range. 


1 Introduction 


Design automation and safety of autonomous systems is an important research 
area. Controller synthesis aims to provide correct-by-construction controllers 
that can guarantee that the system under control meets certain requirements. 
Controller synthesis is a type of program synthesis problem. The synthesized 
program or controller g has to meet the given requirement R, when it is run in 
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(closed-loop) composition with a given physical process or plant A. Therefore, a 
synthesis algorithm has to account for the combined behavior of g and A. 

Methods for designing controllers for asymptotic requirements like stability, 
robustness, and tracking, predate the algorithmic synthesis approaches for pro- 
grams [3, 16,30]. However, these classic control design methods normally do not 
provide formal guarantees in terms of handling bounded-horizon requirements 
like safety. Typical controller programs are small, well-structured, and at core, 
have a succinct logic (“bang-bang” control) or mathematical operations (PID 
control). This might suggest that controllers could be an attractive target for 
algorithmic synthesis for safety, temporal logic (TL), and bounded time require- 
ments [1,9,18,34,38]. 

On the other hand, motion planning (MP), which is an instance of the con- 
troller synthesis for robots is notoriously difficult (see [21] Chapter 6.5). A typi- 
cal MP requirement is to make a robot A track certain waypoints while meeting 
some constraints. A popular paradigm in MP, called sampling-based MP, gives 
practical, fully automatic, randomized, solutions to hard problem instances by 
only considering the geometry of the vehicle and the free space [14,15, 20,21]. 
However, they do not ensure that the dynamic behavior of the vehicle will actu- 
ally follow the planed path without running into obstacles. Ergo, MP continues 
to be a central problem in robotics!. 

In this paper, we aim to achieve faster control synthesis with guarantees by 
exploiting a separation of concerns that exists in the problem: (A) how to drive 
a vehicle/plant to a given waypoint? and (B) Which waypoints to choose for 
achieving the ultimate goal? (A) can be solved using powerful control theoretic 
techniques—if not completely automatically, but at least in a principled fashion, 
with guarantees, for a broad class of A’s. Given a solution for (A), we solve 
(B) algorithmically. A contribution of the paper is to identify characteristics 
of a solution of (A) that make solutions of (B) effective. Consider nonlinear 
control systems A : år = f(x,u) and reach-avoid requirements defined by a 
goal set G that the trajectories should reach, and obstacles O the trajectories 
should avoid. The above separation leads to a two step process: (A) Find a 
state feedback tracking controller gtk that drives the actual trajectory of the 
closed-loop system £y to follow a reference trajectory rer. (B) Design a reference 
controller gref, which consists of a reference trajectory ref and a reference input 
Uret- The distance between ég and &ref is called the tracking error e. If we can 
somehow know beforehand the value of e without knowing £ref, we can use such 
error to bloat O and shrink G, and then synthesize éref such that it is e away 
from the obstacles (inside the goal set). For linear systems, this was the approach 
used in [7], but for nonlinear systems, the tracking error e will generally change 
with ĉef, and the two steps get entangled. 

For a general class of nonlinear vehicles (such as cars, drones, and underwater 
vehicles), the tracking controller gtrk is always designed to minimize the tracking 


1 In the most recent International Conference on Robotics and Automation, among 
the 3,512 submissions “Path and motion planning” was the second most popular key 
phrase. 
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error. The convergence of the error can be proved by a Lyapunov function for 
certain types of €-¢. We show how, under reasonable assumptions, we can use 
Lyapunov functions to bound the value of the tracking error even when the 
waypoints changes (Lemma 2). This error bound is independent of ref so long as 
ref satisfies the assumptions. For step (B) we introduce a SAT-based trajectory 
planning methods to find such ef and ure by solving a satisfiability (SAT) 
problem over quantifier free linear real arithmetic (Theorem 1). Moreover, the 
number of constraints in the SMT problem scales linearly to the increase of 
number of obstacles (and not with the vehicle model). Thus, our methods can 
scale to complex requirements and high dimensional systems. 

Putting it all together, our final synthesis algorithm (Algorithm 2) guarantees 
that any trajectory following the synthesized reference trajectory will satisfy the 
reach-avoid requirements. The resulting tool FACTEST is tested with four non- 
linear vehicle models and on eight different scenarios, taken from MP literature, 
which cover a wide range of 2D and 3D environments. Experiment results show 
that our tool scales very well: it can find the small covers {O,;}; and the cor- 
responding reference trajectories and control inputs satisfying the reach-avoid 
requirements most often in less than a second, even with up to 22 obstacles. We 
have also compared our SAT-based trajectory planner to a standard RRT plan- 
ner, and the results show that our SAT-based method resoundingly outperforms 
RRT. To summarize, our main contributions are: 


1. A method (Algorithm 2) for controller synthesis separating tracking controller 
Qtrk and search for reference controller gref- 

2. Sufficient conditions for tracking controller error performance that makes the 

decomposition work (Lemma 2 and Lemma 3). 

An SMT-based effective method for synthesizing reference controller gref. 

4. The FACTEST implementation of the above and its evaluation showing very 
encouraging results in terms of finding controllers that make any trajectories 
of the closed-loop system satisfy reach-avoid requirements (Sect. 6). 


g 


Related Works. Model Predictive Control (MPC). MPC [4,25,45,49] has to 
solve a constrained, discrete-time, optimal control problem. MPC for controller 
synthesis typically requires model reduction for casting the optimization problem 
as an LP [4], QP [2,36], MILP [33,34,45]. However, when the plant model is 
nonlinear [8,22], it may be hard to balance speed and complex requirements as 
the optimization problem become nonconvex and nonlinear. 


Discrete Abstractions. Discrete, finite-state, abstraction of the control system is 
computed, and then a discrete controller is synthesized by solving a two-player 
game [10,17,24,42,47]. CoSyMA [28], Pessoa [37], LTLMop [18,46], Tulip [9,48], 
and SCOTS [38] are based on these approaches. The discretization step often 
leads to a severe state space explosion for higher dimensional models. 


Safe Motion Planning. The idea of bounding the tracking error through pre- 
computation has been used in several techniques: FastTrack [11] uses Hamilton- 
Jacobi reachability analysis to produce a “safety bubble” around planed paths. 
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Reachability based trajectory design for dynamical environments (RTD) [44] 
computes an offline forward reachable sets to guarantee that the robot is not- 
at-fault in any collision. In [40], a technique based on convex optimization is 
used to compute tracking error bounds. Another technique [23,43] uses motion 
primitives expanded by safety funnels, which defines similar ideas of safety tubes. 


Sampling Based Planning. Probabilistic Road Maps (PRM) [15], Rapidly- 
exploring Random Trees (RRT) [19], and fast marching tree (FMT) [12] are 
widely used in actual robotic platforms. They can generate feasible trajectories 
through known or partially known environments. Compared with the determin- 
istic guarantees provided by our proposed method, these methods come with 
stochastic guarantees. Also, they are not designed to be robust to model uncer- 
tainty or disturbances. MoveIT [5] is a tool designed to implement and bench- 
mark various motion planners on robots. The motion planners in MovelIT are 
from the open motion planning library (OMPL) [41], which implements motion 
planners abstractly. 


Controlled Lyapunov Function (CLF). CLF have been used to guarantee that 
the overall closed-loop controlled system satisfies a reach-while-stay specifica- 
tion [35]. Instead of asking for a CLF for the overall closed-loop system, our 
method only needs a Lyapunov function for the tracking error, which is a weaker 
local requirement. CLF is often a difficult requirement to meet for nonlinear vehi- 
cle models. 


2 Preliminaries and Problem Statement 


Let us denote real numbers by R, non-negative real numbers by R>o, and natural 
numbers by N. The n-dimensional Euclidean space is R”. For a vector x € R”, 
x” is the i*” entry of x and ||2||2 is the 2-norm of x. For any matrix A € R"*™, 
AT is its transpose; A® is the i” row of A. Given a r > 0, an r-ball around 
x € R” is defined as B,(x) = {x € R” | ||x’ — z||2 < r}. We call r the radius 
of the ball. Given a matrix H € R”*” and a vector b € R”, an (H,b)-polytope 
is denoted by Poly(H,b) = {x € R” | Ha < b}. Each row of the inequality 
H®az < b© defines a halfspace. We also call HOx = b) the surface of the 
polytope. Let dP(H) = r denotes the number of rows in H. Given a set S C R”, 
the radius of S$ is defined as sup, yes || — yl|2/2. 


State Space and Workspace. The state space of control systems will be a subspace 
X C R”. The workspace is a subspace W C R?, for d € {2,3}, which is the 
physical space in which the robots have to avoid obstacles and reach goals. 
Given a state vector x € X, its projection to W is denoted by x | p. That is, x | 
p = [Px,Py|™ € R? for ground vehicles on the plane and x | p = [px, py, pz|7 € R? 
for aerial and underwater vehicles. When «x is clear from context we will write 
x | pas simply p. The vector x may include other variables like velocity, heading, 
pitch, etc., but p only has the position in Cartesian coordinates. We assume that 
the goal set G := Poly(He, ba) and the unsafe set O (obstacles) are specified by 
polytopes in W; O = UO;, where O; := Poly(Ho,i,bo,;) for each obstacle i. 
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Trajectories and Reach-Avoid Requirements. A trajectory € over X of duration T 
is a function £ : [0, T] — æ, that maps each time t in the time domain [0,T] toa 
point E(t) € X. The time bound or duration of £ is denoted by €.ltime = T. The 
projection of a trajectory £ : [0, T] — ¥ to W is written as € | p : [0, T] > W and 
defined as (£ | p)(t) = &(t) | p. We say that a trajectory E(t) satisfies a reach- 
avoid requirement given by unsafe set O and goal set G if Vt € [0, €.Itime], E(t) | 
p ¢ O and €(€.ltime) | p € G. See Fig. 1 for an example. 

Given a trajectory £ : [0,T] — ¥ and a time t > 0, the time shift of £ isa 
function (E + t) : [t,t + T] — ¥ defined as Vt’ € [t,t +T], (E+E (€) = ¿E — t). 
Strictly speaking, for t > 0, + t is not a trajectory. The concatenation of two 
trajectories é&1 — £z is a new trajectory in which £, is followed by 2. That is, for 
each t € [0, €1.Itime+ &9.Itime], (€1 > £2) (t) = €1(t) when t < &1.ltime, and equals 
E(t — £ .ltime) when t > €,.Itime. Trajectories are closed under concatenation, 
and many trajectories can be concatenated in the same way. 


2.1 Nonlinear Control System 


Definition 1. An (n,m)-dimensional control system A is a 4-tuple (VY, 0, U, f) 
where (i) X C R” is the state space, (ii) © C X is the initial set, (iii) U C R™ 
is the input space, and (iv) f : X x U — & is the dynamic function that is 
Lipschitz continuous with respect to the first argument. 


A control system with no inputs (m = 0) is called a closed system. 

Let us fix a time duration T > 0. An input trajectory u : [0,T] — U, is a 
continuous trajectory over the input space U. We denote the set of all possible 
input trajectories to be U. Given an input signal u € U and an initial state 
zo € O, a solution of A is a continuous trajectory £u : [0, T] — Æ that satisfies 
(i) ¿u (0) = xo and (ii) for any t € [0,7], the time derivative of &,, at t satisfies 
the differential equation: 


d 
qoult) = FE), ut). (1) 


For any zo € O,u € U, ĉu is a state trajectory and we call such a pair (ĉu, u) a 
state-input trajectory pair. 

A reference state trajectory (or reference trajectory for brevity) is a trajectory 
over ¥ that the control system tries to follow. We denote reference trajectories 
by ĉef. Similarly, a reference input trajectory (or reference input) is a trajectory 
over U and we denote them as Urer. Note these Erep and Uref are not necessarily 
solutions of (1). Figure 1 shows reference and actual solution trajectories. 

We call a reference trajectory ref and a reference input Uret together as a 
reference controller gref. Given gref, a tracking controller gt, is a function that 
is used to compute the inputs for A so that in the resulting closed system, the 
state trajectories try to follow Ever. 


Definition 2. Given an (n,m)-dynamical system A, a reference trajectory Eef, 
and a reference input ures, a tracking controller for the triple (A, Eref, Uret) is a 
(state feedback) function gi: X x Xx U — U. 
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At any time t, the tracking controller gtrk takes in a current state of the system 
x, a reference trajectory state €¢(t), and a reference input Uref(t), and gives an 
input Gerk(X, Eret(t), Urer(t)) € U for A. The controller g for A is determined by 
both the reference controller gref and the tracking controller gtk. The resulting 
trajectory €, of the closed control system (A closed with gref and gtr) satisfies: 


d 


won!) = J (£g (t), Itrk (Eg (t), Eret(t), uref(t))) y tE (0, T\\D, (2) 


where D is the set of points in time where the second or third argument of gi, 
is discontinuous”. 


2.2 Controller Synthesis Problem 


Definition 3. Given a (n,m)-dimensional nonlinear system A = (X,0,U, f), 
its workspace W, goal set G C W and the unsafe set O C W, we are required to 
find (a) a tracking controller gi, (b) a partition {0;}; of O, and (c) for each 
partition O;, a reference controller gj ref, which consists of a state trajectory £j ref 
and an input trajectory uj ref, such that Vag € O;, the unique trajectory £g of the 
closed system as in Eq. (2) starting from xo reaches G and avoids O. 


Again, €j ref and Uj ref iN gj ref are not required to be a state-input pair, but, 
for each initial state zo € Oj, the closed loop trajectory ég following fret is a 
valid state trajectory with corresponding input u generated by gtrk and gj,ref. In 
this paper, we will decompose the controller synthesis problem: Part (a) will be 
delivered by design engineers with knowledge of vehicle dynamics, and parts (b) 
and (c) will be automatically synthesized by our algorithm. The latter being the 
main contribution of the paper. 


Example 1. Consider a ground vehicle moving on a 2D workspace W C R? as 
shown in Fig. 1. 


This scenario is called Zigzag and 
it is adopted from [32]. The red poly- 
topes are obstacles. The blue and 
green polytopes are the initial set O 
and the goal set G. There are also 
obstacles (not shown in the figure) 
defining the boundaries of the entire 
workspace. The black line is a projec- 
tion of a reference trajectory to the 
workspace: Ee¢(t) | p. This would not 


Fig. 1. Zigzag scenario for a controller syn- 
thesis problem. The initial set is blue, the 


be a feasible state trajectory for a 
ground vehicle that cannot make sharp 
turns. The purple dashed curve is a 


goal set is green, and the unsafe sets are 
red. A valid reference trajectory is shown 
in black and a feasible trajectory is shown 
in purple. (Color figure online) 


2 €, is a standard solution of ODE with piece-wise continuous right hand side. 
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real feasible state trajectory of the system starting from © with a tracking con- 
troller grk, where girk will be introduced in Example 2. 

Consider the standard nonlinear bicycle model of a car [31]. The control 
system has 3 state variables: the position pr, py, and the heading direction @. Its 
motion is controlled by two inputs: linear velocity v and rotational velocity w. 
The car’s dynamics are given by: 


4p, = vcos(0), py = vsin(0), £6 = w. (3) 


3 Constructing Reference Trajectories from Waypoints 


If Eer(t) | p is a PWL (PWL) curve in the workspace W, we call f(t) a 
PWL reference trajectory. In W, a PWL curve can be determined by the 
endpoints of each line segment. We call such endpoints the waypoints of the 
PWL reference trajectory. In Fig. 1, the black points po,--- , pe are waypoints of 
p(t) = &rer(t) | p. 

Consider any vehicle on the plane? with state variables py, Py, 0, v (x-position, 
y-position, heading direction, linear velocity) and input variables a,w (acceler- 
ation and angular velocity). Once the waypoints {p;}"_) are fixed, and if we 
enforce constant speed @ (i.e., Erer(t) | v = v for all t € [0, Erer.Itime]), then Ere¢(t) 
can be uniquely defined by {p;}*_) and 0 using Algorithm 1. The semantics of 
ref and Uref returned by Waypoints_to_Traj is that the reference trajectory 
requires the vehicle to move at a constant speed v along the lines connecting 
the waypoints {pi} o. In Example 1, &er(t), Uref(t) can also be constructed using 
Waypoints_to_Traj moving v to input variables and dropping a. 

We notice that if k = 1, re¢(t), Uret(t) returned by Algorithm 1 is a valid 
state-input trajectory pair. However, if k > 1, Eer(t), Uref(t) returned by Algo- 
rithm 1 is usually not a valid state-input trajectory pair. This is because O,e¢(t) 
is discontinuous at the waypoints and no bounded inputs trer(t) can drive the 
vehicle to achieve such 6,er(t). Therefore, when k > 1, &:e¢(t) is a PWL reference 
trajectory with no Urer(t) such that ref, Uret are solutions of (1). 


Algorithm 1: Waypoints_to_Traj({p;}*_9, 0) 

input : {pi}o,0 
1 Vie (0,07, loisa; Vrer(t) = U, Grer(t) = 0, Wrer(t) = 0; 
2 Vi>1,Vt pa pe yi, ee il), 


gal 


Dret(t) = pi-1 + Vt — a |p; =i 
Oref(t) = mod(atan2((py,i — Py,i—1); (Pa,i — Px,i—1), 27); 
3 Eret(t) = [Pret (t), Oree(t), Vrer(t)]; 


4 Uret(E) = [aret(t), Wrer(t)]; 
5 return £ef(t), Urer(t) ; 


3 A similar construction works for vehicles in 3D workspaces with additional variables. 
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Proposition 1. Given a sequence of waypoints {p;}*_, and a constant speed Ù, 

Eret(t), Ure(t) produced by Waypoints_to_Traj({p;}*_o,0) satisfy: 

— Pref(t) = Eret(t) | p is a piece-wise continuous function connecting fog Pa. 

— At time ti = D754 ||P; — Pj-1l|2/0, Preti) = pi. We call {t;})*_, the concate- 
nation time. 

= Eret(t) = Eref.1(t) ee ee Evef k(t) and Urer( t) = Uref.1(t) are ae ek Ure, k(t), 
where (Erefi,; Urefi) are state-input trajectory pairs returned by the function 
Waypoints_to_Traj({pi—1, pi}, U). 


Outline of Synthesis Approach. In this Section, we present an Algo- 
rithm Waypoints_to_Traj for constructing reference trajectories from arbitrary 
sequence of waypoints. In Sect.4, we precisely characterize the type of vehi- 
cle tracking controller our method requires from designers. On our tool’s web- 
page [27], we show with several extra examples that indeed developing such 
controllers is non-trivial, far from automatic, yet bread and butter of control 
engineers. In Sect.5, we present the main synthesis algorithm, which uses the 
tracking error bounds from the previous section, to construct waypoints, for 
each initial state, which when passed through Waypoints_to_Traj provide the 
solutions to the synthesis problem. 


4 Bounding the Error of a Tracking Controller 


4.1 Tracking Error and Lyapunov Functions 


Given a reference controller gref, a tracking controller gik, and an initial state 
xo € O, the resulting trajectory € of the closed control system (A closed with 
Gref ANd grrk) is a state trajectory that starts from x and follows the ODE (2). In 
this setting, we define the tracking error at time t to be a continuous function: 


e:X xX — R”. 


When £&,(t) and &er(t) are fixed, we also write e(t) = e(€,(t), frer(t)) which makes 
it a function of time. One thing to remark here is that if &e¢(¢) is discontinuous, 
then e(t) is also discontinuous. In this case, the derivative of e(t) cannot be 
defined at the points of discontinuity. To start with, let us assume that grep = 
(Ere, Uref) is a valid state-input pair so &;ef is a continuous state trajectory. Later 
we will see that the analysis can be extended to cases when €;ef is discontinuous 
but a concatenation of continuous state trajectories. 

When (ref, Uret) is a valid state-input pair and e(t) satisfy an differential 
equation e(t) = fe(e(t)), we use Lyapunov functions, which is a classic tech- 
nique for proving stability of an equilibrium of an ODE, to bound the tracking 
error e(t). The Lie derivative av fe(e) below captures the rate of change of the 


function V along the trajectories of e(t). 
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Definition 4 (Lyapunov functions [16]). Fix a state-input reference trajec- 
tory pair (Eref, Uref), assume that the dynamics of the ree error e for a closed 
control system A with gref and gik can be rewritten as Ze(t) = f.(e(t)), where 
fe(0) = 0. A continuously differentiable function V : R” > R eee (i) 
V(0) = 0, (ii) Ve € R",V(e) > 0, and (iii) Ve € R”, 2 Ý fele) < 0, is called a 
Lyapunov function for the tracking error. 


Example 2. For the car of Example1, with a continuous reference trajectory 


Eret(t) = [aret(t), Yrer(t), Oref(t)|™, we define the tracking error in a coordinate 
frame fixed to the car [13]: 
ex(t) cos(0(t)) sin(0(t)) 0 Tref(t) — px (t) 
ey(t) | = | —sin(0(t)) cos(0(t)) 0 | | yrer(t) — py(t) | - (4) 
eolt) 0 0 1) \ Oer(t) — 00) 


With the reference controller function g defined as: 


u(t) = vrer(t) cos(eg(t)) + kiez(t), (5) 
w(t) = wreflt) + Uret(t)(k2ey(t) + ks sin(eọ(t))), 


it has been shown in [13] when kı, ko, k3 > 0, welt) = 0, and Z Urer(t) = 0, 


V (lex, ey, e6]") = k a es) + el (6) 


is a Lyapunov function with negative semi-definite time derivative ov je = 


2 Vrefk3 sin? (eo) | 
kı Cr ko 


4.2 Bounding Tracking Error Using Lyapunov Functions: Part 1 


Consider a given closed control system, A with gref and gtk, in this section, 
we will derive upper bounds on the tracking error e. Later in Sect.5, we will 
develop techniques that take the tracking error into consideration for computing 
reference trajectories Ever. 

To begin with, we consider state-input reference trajectory pairs (ref, Uref) 
where Ure is continuous, and therefore, Eef and €, are differentiable. Let us 
assume that the tracking error dynamics (e(t) = fe(e(t))) has a Lyapunov 
function V(e(t)). The following is a standard result that follows from the theory 
of Lyapunov functions for dynamical systems. 


Lemma 1. Consider any state-input trajectory pair (Eref, Uref), an initial state 
xo, the corresponding trajectory Eg of the closed control system, and a constant 
£ > 0. If the tracking error e(t) has a Lyapunov function V, and if initially 
V(e(0)) < £, then for any t € [0, Ere¢.Itime], V(e(t)) < £. 


This lemma is proved by showing that V(e(t)) = ))+ e ))dr < 
V(e(0)). The last inequality holds since 4V(e(r)) = Oe Rte ) < 0 o a TE 
[0, t] according the definition of Lyapunov functions (Definition 4). 
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Lemma 1 says that if we can bound V(e(0)) = V(e(2o, &ret(0))), we can bound 
V (e(Eq(#), &rer(t))) at any time t within the domain of the trajectories, regardless 
of the value of &e¢(t). This could decouple the problem of designing the track- 
ing controller gik and synthesizing the reference controller gref as a state-input 
trajectory pair (Eref, Uref)- 


Example 3. Given two waypoints po,p; for the car in Examplel, take the 
returned value of Waypoints_to_Traj({po,pi},%), move Vref tO Uref and drop 
Gref- Then, the resulting (ref, Uref) is a continuous and differentiable state-input 
reference trajectory pair. Moreover, if the robot is controlled by the tracking 
controller as in Eq. (5), V(e(t)) = $(€x(t)? + ey(t)?) + eee is a Lyapunov 
function for the corresponding tracking error e(t) = [ex(t), e,(t), eo (t)]T. 

From Eq. (4), it is easy to check that e?(t) + ey(t)? = (arer(t) — px(t))? + 
(Yret(t) — py(t))? for any time t. Assume that initially the position of the vehi- 
cle satisfies [py (0), py(0)]T € Be([ret(O), Yrer(0)]™). We check that V(e(0)) = 


—cos(e 2 
1 (ez (0)? + ey(0)?) + ese) < £4 2. 


2 2 

From Lemmal, we know that Vt € [0,&e.ltime], V(e(t)) < a + e 
Then we have (2ref(t) — Pe(t))? + (Yrer(t) — Py(t))? = (e(t)? + ey(t)?) < 
C+ a That is, the position of the robot at time t satisfies [pz(t), py(t)]™ € 


B Jerg Pet), Yref(t)]™). 


4.3 Bounding Tracking Error Using Lyapunov Functions: Part 2 


Next, let us consider the case where E;er is discontinuous. Furthermore, let us 
assume that it is a concatenation of several continuous state trajectories ref, œ 

- ~ Ẹref,k- In this case, we call Eef a piece-wise reference trajectory. If we have 
a sequence of (refi; Uref,i), each is a valid state-input trajectory pair and the 
corresponding error e;(t) has a Lyapunov function V;(e;(t)), then we can use 
Lemma 1 to bound the error of e;(t) if we know the value of e;(0). However, 
the main challenge to glue these error bounds together is that e(t) would be 
discontinuous with respect to the entire piece-wise €;er(t). 

Without loss of generality, let us assume that the Lyapunov functions 
V,(e;(t)) share the same format. That is, Vi, Vi(e:(t)) = V(e.(t)). Let t; be the 
concatenation time points when €rer(t) fend therefore e(t)) is discontinuous. We 
know that lim, .,- V(e(t)) 4 lim, V(e(t)) since lim, _,,- e(t) # lim,_,,+ e(t). 

One meii we can get from Example 3 is that alitcuan e(t) is Pe 
at time t;s, some of the variables influencing e(t) are continuous. For exam- 
ple, e,(t) and e,(t) in Example 3, which represent the error of the positions, 
are continuous since both the actual and reference positions of the vehicle are 
continuous. If we can further bound the term in V(e(t)) that corresponds to 
the other variables, we could analyze the error bound for the entire piece-wise 
reference trajectory. With this in sight, let us write e(t) as [ep(t), er(t)], where 
e,(t) = e(t) | p is the projection to W and e,(t) is the remaining components. 

Let us further assume that the Lyapunov function can be written in the form 
of V(e(t)) = a(e,(t)) + B(e-(t)). Indeed, on the tool’s webpage [27] we show 
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that four commonly used vehicle models (car, robot, underwater vehicle, and 
hovercraft) have Lyapunov functions for the tracking error e(t) of this form. If 
B(er(t)) can be further bounded, then the tracking error for the entire trajectory 
can be bounded using the following lemma. 


Lemma 2. Consider Erf = ref © tt > Eref,k; ANU Uref = Uret Dt > Uref,k 
as a piecewise reference and input with each (Erefi,Ureti) being a state-input 
trajectory pair. Suppose (1) V(e(t)) = a(ep(t))+8(er(t)) be a Lyapunov function 
for the tracking error e(t) of each piece (Erefi, Urefi); (2) Ep(t) is continuous and 
a(-) is a continuous function; (3) B(er(t)) € (br, bu], and (4) V(e(0)) < £o. Then, 
the tracking error e(t) with respect to Eef and Uret can be bounded by, 


V(e(t)) 5 Ei, Vi > 1,Vt E lti—1; ti), 


where V i > 1, £i = E€i—1 — bı + bu, €1 = £0 being the bound on the initial tracking 
error, and t;’s are the time points of concatenation’. 


Proof. We prove this by induction on i. When i = 1, we know from Lemma 1 
that if the initial tracking error is bounded by V(e(0)), then for any t € 
(0, t1), V(e(t)) < V(e(0)) < £o = £1, so the lemma holds. 

Fix any i > 1. If V(e(t;_1)) < £;i, from Lemmal we have Vt € [t;_1,t,), 
V(e(t)) < ci. Also, lim, +- V(e(t)) = lim,_,,- a(ep(t)) + Bler(t)) < zi. Since 


Ve-(t) € R"~4, B(e,(t)) € [br, bu], we have lim, ,,- a(ep(t)) < ci — bi, and 


i 


lim, ,,- a(ep(t)) = lim,_,,+ a(ep(t)). Therefore, 


€i41 = lim V(e(t)) = lim a(e,(t)) + B(er(t)) < £i — b + bu. 


tott tot} 


Another observation we have on the four vehicle models used in this paper is 
that not only V(e(t)) can be written as a(e,(t)) + G(e-(t)) with G(e,(t)) being 
bounded, but also a(ep(t)) can be written as a(ep(t)) = cef(t)ep(t) = el|p(t) — 
Pret(t)||3, where c € R is a scalar constant; p(t) = &4(t) | p and prer(t) = Erer(t) | p 
are the actual position and reference position of the vehicle. In this case, we can 


further bound the position of the vehicle p(t). 


Lemma 3. In addition to the assumptions of Lemma2, if a(e,(t)) = 
cez (t)ep(t) = cllp(t) — Pref{t)||3, where c € R,p(t) = &9(t) | p and pret) = 
Eref(t) | p. Then we have that at time t € [tj_-1, ti); 


E€&i— b 
glet < E, 


where €; and bı are from Lemma 2, which implies that 


i—b 
p(t) € Be: (Prelt)), with ti = 4| E. 


g Vt € [ts—1, ti), Eret(t) = Eveti(t = poe Evef,j-Itime). 
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Note that Lemma 2 and 3 does not depend on the concrete values of ef and 
Uref- The lemmas hold for any piece-wise reference trajectory fre and reference 
input U;er as long as the corresponding error e has a Lyapunov function (for each 
piece of ef and Uref). 


Example 4. Continue Example 3. 


Now let us consider the case of 
a sequence of waypoints {p;}*_). Let 
(Ecef, Uref) = Waypoints_to_Traj({p;}*_o, U). 
From Example3, we know that V(e(t)) = 
L(eq(t)? + ey(t)?) + eee) is a Lya- 
punov function for each segment of the piece- 
wise reference trajectory €er(t). We also 
know that for any value of eg, the term 
1—cos(ee (H) € [0, ż]. From Lemma 2, we have 
that for t € [t;-1,t;) where t; are the con- 
catenation time points, we have V(e(t)) < bounds computed. from: Leiima $, 


2(i—1 : 
V(e(0)) + 20-0 ) Therefore, following Exam- The ;** line segment is bloated by 
ple 3, initially V(e(0)) < - + A Then Vt € 4/& + {2. The closed-loop system’s 


[ts1, ts), V(e(t)) < = + os and the posi- trajectory p(t) are purple curves and 


tion of the robot satisfies [p,(t),p,(t)]™ € they are contained by the bloated- 
B papel let) Yret(t)]"). tube. (Color figure online) 

As seen in Fig. 2, we bloat the black reference trajectory Pref(t) = Erer(t) | p 
by 4i = ,/€? + Zz for the it” line segment, the bloated tube contains the real 
position trajectories (purple curves) p(t) of the closed system. 


Fig. 2. Illustration of the error 


5 Synthesizing the Reference Trajectories 


In Sect. 4.3, we have seen that under certain conditions, the tracking error e(t) 
between an actual closed-loop trajectory g(t) and a piece-wise reference €re¢(t) 
can be bounded by a piece-wise constant value, which depends on the initial 
tracking error e(0) and the number of segments in &;er. We have also seen an 
example nonlinear vehicle model with PWL €,e¢ for which the tracking error can 
be bounded. 

In this section, we discuss how to utilize such bound on e(t) to help find a 
reference controller gref consisting of a reference trajectory €;er(¢) and a reference 
input Urer(t) such that closed-loop trajectories €,(t) from a neighborhood of 
&et(0) that are trying to follow &,e¢(t) are guaranteed to satisfy the reach-avoid 
requirement. The idea of finding a gref follows a classic approach in robot motion 
planning. The intuition is that if we know at any time t € (0, re.Itime], ||Eg (t) | 
p — Eref(t) | pllg will be at most £, then instead of requiring fre¢(t) | p to be 
at least l away from the obstacles (inside the goal region), we will bloat the 
obstacles (shrink the goal set) by 4. Then the original problem is reduced to 
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finding a rer(t) such that €;e¢(t) | p can avoid the bloated obstacles and reach 
the shrunk goal set. 


5.1 Use PWL Reference Trajectories for Vehicle Models 


Finding a reference trajectory €re¢(t) such that (a) S:er(t) satisfies the reach-avoid 
conditions, and (b) €rer(t) and tref(t) are concatenations of state-input trajectory 
pairs {(Evef,i; Uref,i)}; and each pair satisfies the system dynamics, is a nontriv- 
ial problem. If we were to encode the problem directly as a satisfiability or an 
optimization problem, the solver would have to search for over the space of con- 
tinuous functions constrained by the above requirements, including the nonlinear 
differential constraints imposed by f. The standard tactic is to fix a reasonable 
template for Ever(t), Uref(t) and search for instantiations of this template. 

From Example4, we see that if €¢ is a PWL reference trajectory con- 
structed from waypoints in the workspace, the tracking error can be bounded 
using Lemma2. A PWL reference trajectories connecting the waypoints in the 
workspace have the flexibility to satisfy the reach-avoid requirement. Therefore, 
in this section, we fix éref and Uref to be the reference trajectory and reference 
input returned by the Waypoints_to_Traj(-,-). In Sect. 5.2, we will see that the 
problem of finding such PWL €;er(t) can be reduced to a satisfiability problem 
over quantifier-free linear real arithmetic, which can be solved effectively by 
off-the-shelf SMT solvers (see Sect. 6 for empirical results). 


5.2 Synthesizing Waypoints for a Linear Reference Trajectory 


Algorithm 1 says that &er(t) and urett) can be uniquely constructed given a 
sequence of waypoints {p;}*_, in the workspace W and a constant velocity 0. 
From Proposition 1, Dref(t) = &rer(t) | p connects the waypoints in W. Also, let 
ti = $ ;=1 ||P — pj-1|l2/0 be the concatenation time, Vt € [ti-1, ti), p(t) is the 
line segment connecting p;—ı and p;. We want to ensure that p(t) = &,(t) | p 
satisfy the reach-avoid requirements. From Lemma 3, for any t € [t;-1,t;), we 
can bound ||p(t) — pret(t)||2 with the constant ¢;, then the remaining problem is 
to ensure that, Pref(t) is at least 4; away from the obstacles and Prer(Ever-Itime) is 
inside the goal set with @; distance to any surface of the goal set. 

Let us start with one segment p(t) with t € [t;_1,t;). To enforce that p(t) 
is £; away from a polytope obstacle, a sufficient condition is to enforce both 
the endpoints of the line segment to lie out at least one surface of the polytope 
bloated by €;. 


Lemma 4. If pret) with t € [ti-1, ti) is a line segment connecting p;—1 and pi 
in W. Given a polytope obstacle O = Poly(Ho, bo) and £i > 0, if 


dP(Ho) 
Vo (CAS pii >b + HP lots) A (HE pi > 1) + HE l24)) = True, 


s=l 


then Vt € [ti-1, ti), Be; (Pret) NO = b. 
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Proof. Fix any s such that (HY pii > p9) + IHS |24) A (HS p; > pte + 
|H |24) holds. The set S = {q € Rt | HẸ?q > dS? + || HE? |24} defines a 
convex half space. Therefore, if pj-1 € S and p; € S, then any point on the 
line segment connecting p;_; and p; is in S. Therefore, for any t € [t;-1, ti), 
HS? Deer (t) >b + JHS |24 > b, which means prye¢(t) ¢ O. 


(s) _ ps) 
The distance between prett) and the surface H Oq = pls is To petol 
o l2 


Li. Therefore, for any p € Be, (Pret(t)) we have ||p — pret(t)l|2 < 4 and thus p ¢ O. 


Furthermore, NEHO) Hq < HE ||2£; defines of a new polytope that 
we get by bloating Poly(Ho, bo) with 4;. Basically, it is constructed by moving 
each surface of Poly( Ho, bo) along the surface’s normal vector with the direction 
pointing outside the polytope. 

Similarly, we can define the condition when prer(€.ltime) = pp is inside the 
goal shrunk by @,. 


Lemma 5. Given a polytope goal set G = Poly(Ho, bg) and lp > 0, if 


dP(Ha) 
VAN (HË pi Zhe HS oe) = True, then Be, (pr) C G. 
s=1 


Putting them all together, we want to solve the following satisfiability prob- 
lem to ensure that each line segment between p;—; and p; is at least 4; away 
from all the obstacles and px is inside the goal set G with at least distance x to 
the surfaces of G. In this way, €,(t) starting from a neighborhood of €er(0) can 
satisfy the reach-avoid requirement. 


waypoints (Pref(O), k, O, G, {4 }) = Apo, ‘t+ Dk; 


Po == Pref (0) 
dP(H@) 


(HE pr < 0S) — IHG’ llate) 


s= 


ak 
k dP(H) 
A A V (HO pii > 0 + LEO 2 A Hp; > VO) + Gl] HO Ifa) 
Poly(H,b)€O 


i=l s=} 


Notice that the constraints in @waypoints are all linear over real 
arithmetic. Moreover, the number of constraints in  @waypoints iS 


(0) ( > kdP(H) + dP(H,) |. That is, fixing k, the number of constraints 
Poly(H,b)€O 

will grow linearly with the total number of surfaces in the obstacle and goal set 

polytopes. Fixing O and G, the number of constraints will grow linear with the 

number of line segments k. 
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Theorem 1. Fiz k > 1 as the number of line segments, Dref(0) E W as the 
initial position of the reference trajectory. Assume that 


(1) A closed with gref and gtrk is such that given any sequence of k+1 waypoints 
in W and any v, the piece-wise reference Ever (and input uret) returned by 
Algorithm 1 satisfy the conditions in Lemmas 2 and 8 with Lyapunov func- 
tion V(e(t)) for the tracking error e(t). 

(2) For the above Ere, fix an co such that V(e(0)) < £o, let {€;}*_, be error 
bounds for positions constructed using Lemma 2 and Lemma 3 from €o. 

(3) Pwaypoints(Prer(0), k, O, G, {0;}*_,) is satisfiable with waypoints {pi}: 


Let Eref(t), Ure(t) = Waypoints_to_Trajectory ({p;}*_9,0), and Pret) = Erer(t) | p- 
Let €,(t) be a trajectory of A closed with gtrk(-,&ref, Uret) starting from €,(0) with 
V(e(Eq(0), Ere(0))) < £o, then €,(t) satisfies the reach-avoid requirement. 


Proof. Since Eer(t), Uref(t) are a PWL reference trajectory and a reference input 
respectively constructed from the waypoints {p;}*_9, they satisfy Assumption 
(1). Moreover, V(e(€,(0), Erer(O))) < £o satisfies Assumption (2). Using Lemma 2 
and Lemma 3, we know that for t € [ti—1, ti), ||&(t) | p — &ter(t) | pll2 < &. 

Finally, since {pi} o satisfy the constraints in waypoints, Using Lemma 4 and 
Lemma 5, we know that for any time t € [0, t+], E(t) | p O and €,(t,) € G. 
Therefore the theorem holds. 


5.3 Partitioning the Initial Set 


Starting from the entire initial set ©, fix ĉret(0) € © and an £ọ such that Va € 
0, V (e(x, Eret(0))) < £0, then we can use Lemma 2 and Lemma 3 to construct the 
error bounds {¢;}*_, for positions, and next use {@;}*_, to solve waypoints and 
find the waypoints and construct the reference trajectory. 

However, if the initial set © is too large, {¢;}*_, could be too conservative 
SO Pwaypoints is not satisfiable. In the first two figures on the top row of Fig. 3, 
we could see that if we bloat the obstacle polytopes using the largest ¢;, then 
no reference trajectory is feasible. In this case, we partition the initial set O to 
several smaller covers ©; and repeat the above steps from each smaller cover Oj. 
In Lemma 2 and Lemma3 we could see that the values of {¢;}*_, decrease if £o 
decreases. Therefore, with the partition of 0, we could possibly find a reference 
trajectory more and more easily. As shown in Fig. 3 bottom row, after several 
partitions, a reference trajectory for each ©; could be found. 
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Fig. 3. Top row: each step attempting to find a reference trajectory in the space where 
obstacles (goal set) are bloated (shrunk) by the error bounds {é;};. From left to right: 
Without partition, {¢;}; are too large so a reference trajectory cannot be found. © is 
partitioned, but {¢;}s for the left-top cover are still too large. With further partions, 
a reference trajectory could be found. Bottom row: It is shown that the bloated tubes 
for each cover (which contain all other trajectories from that cover) can fit between 
the original obstacles. 


5.4 Overall Synthesis Algorithm 


Taking partitioning into the overall algorithm, we have Algorithm2 to solve 
the controller synthesis problem defined in Sect.2.2. Algorithm2 takes in as 
inputs (1) an (n,m)-dimensional control system A, (2) a tracking controller 
Jerk, (3) Obstacles O, (4) a goal set G, (5) a Lyapunov function V(e(t)) for the 
tracking error e that satisfies the conditions in Lemma2 and Lemma 3 for any 
PWL reference trajectory and input, (6) the maximum number of line segments 
allowed Segmax, (7) the maximum number of partitions allowed Partmax, and (8) 
a constant velocity v. The algorithm returns a set RefTrajs, such that for each 
triple (O,, Êj ref, Uj,ref) € RefTrajs, we have Vxo € O,, the unique trajectory &g 
of the closed system (A closed with girk(-, Ej ref, Uj,ref)) Starting from zo satisfies 
the reach-avoid requirement. The algorithm also returns (Cover, None), which 
means that the algorithm fails to find controllers for the portion of the initial 
set in Cover within the maximum number of partitions Partmax. 

In Algorithm 2, Cover is the collection of covers in O that the corresponding 
Evef and Uref have not been discovered. Initially, Cover only contains ©. The for- 
loop from Line 2 will try to find a Sef and a Urer for each O € Cover until the 
maximum allowed number for partitions is reached. At line 3, we fix the initial 
state of Erer(0) = Sint to be the center of the current cover ©. Then at Line 4, 
we get the initial error bounds £ọ after fixing init. Using €o and the Lyapunov 
function V(e), we can construct the error bounds {¢;}*_, for the positions of the 
vehicle using Lemma 2 and Lemma 3 at Line 5. 
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Algorithm 2: Controller synthesis algorithm 
input : A= (¥,0,U, f), grk, O, G, V (e(t)), Se8max, Partmax, U 
initially: Cover — {9}, prt — 0, k — 1, RefTrajs — 0 
while (Cover 4 Ø) A (prt < Partmax) do 

for © € Cover do 


1 

2 

3 Eint — Center(O) ; 

4 £o — a such that Yx € O, V(e(2, Einit)) < a; 

5 {0;}8_, — GetBounds(V(e(t)), £o) ; 

6 while k < Seg,,,, do 

7 if CheckSAT(Einit | p, k, O, G, {¢:}*%_,)) == SAT then 
8 Po, *** , Pk — GetValue(dwaypoints) 3 

9 Evef, Uret — Waypoints_to_Traj({pi}$_9,0) ; 
10 RefTrajs  RefTrajs U (©, Eret, Uref) ; 
11 Cover + Cover \ {9}; 
12 ke-l; 
13 Break ; 
14 else 
15 | kok+l 
16 if k > Segmax then 
17 Cover < Cover U Partition(O) \ {O} ; 
18 prt — prt+1; 
19 k-—1; 


20 return RefTrajs, (Cover, None) ; 


If the if condition at Line 7 holds with {p;}*_9 being the waypoints that 
satisfy waypoints, then from Theorem1 we know that the fret, Uref constructed 
using {p;}*_9 at Line 9 will be such that, the unique trajectory £g of the closed 
system (A closed with gtrk(-, Stef, Uref)) Starting from zo € © satisfies the reach- 
avoid requirement. Otherwise the algorithm will increase the number of segments 
k in the PWL reference trajectory (Line 15). When the maximum number of line 
segments allowed is reached but the algorithm still could not find ref, Uret that 
can guarantee the satisfaction of reach-void requirement from the current cover 
©, we will partition the current © at Line 17 and add those partitions to Cover. 
At the same time, k will be reset to 1. 


Theorem 2 (Soundness). Suppose the inputs to Algorithm 2, A, gtr, O, G, 
V(e(t)), U satisfy the conditions of Theorem 1. Let the output be RefTrajs = 
{(O5, Ej ref, Uj ref) }j and (Cover, None), then we have (1). © C UO;UCover, and 
(2). for each triple (Oj, Ej ref, Uj,ref), we have Vxo € Oj, the unique trajectory £g 
of the closed system (A closed with gutrk(-, &),ref; Uj,ref)) starting from xo satisfies 
the reach-avoid requirement. 


The theorem follows directly from the proof of Theorem 1. 


6 Implementation and Evaluation 


We have implemented our synthesis algorithm (Algorithm 2) in a prototype open 
source tool we call FACTEST® (FAst ConTrollEr SynThesis framework). Our 


5 All models and source code of FACTEST are available at [27]. 
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implementation uses Pypoman®, Yices 2.2 [6], SciPy’ and NumPy® libraries. 
The inputs to FACTEST are the same as the inputs in Algorithm 2. FACTEST 
terminates in two ways. Hither it finds a reference trajectory £j ref and reference 
input Uj,ref for every partition O; of O so that Theorem 2 guarantees they solved 
the controller synthesis problem. Otherwise, it terminates by failing to find ref- 
erence trajectories for at least one subset of O after partitioning O up to the 
maximum specified depth. 


6.1 Benchmark Scenarios: Vehicle Models and Workspaces 


We will report on evaluating FACTEST in several 2D and 3D scenarios drawn 
from motion planning literature (see Figs. 4). Recall, the state space ¥ dimen- 
sion corresponds to the vehicle model, and is separate from the dimensionality 
of the workspace W. We will use four nonlinear vehicle models in these different 
scenarios: (a) the kinematic vehicle model (car) [31] introduced in Example 1, 
(b) a bijective mobile robot (robot) [13], (c) a hovering robot (hovercraft), and 
(d) an autonomous underwater vehicle (AUV) [29]. The dynamics and tracking 
controllers (gtr) of the other three models are described on the FACTEST web- 
site [27]. Each of these controllers come with a Lyapunov function that meets 
the assumptions of Lemmas 2 and 3 so the tracking error bounds given by the 
lemmas {¢}*_,; can be computed. 


(a) Zigzag [32] (b) Maze [32] (c) SCOTS [38] (d) Barrier 


oa a 


(e) Simple Env (£) Difficult Env (g) L-tunnel [32] (h) Z-tunnel [32] 


Fig. 4. 2D and 3D workspaces with initial (blue) and goal (green) sets. The scenar- 
ios run in the two-dimensional W use the car model. The scenarios run in the three 
dimensional W use the hovercraft model. The black lines denote £ref and the dotted 
violet lines denote £g. (Color figure online) 


6 https: //pypi.org/project /pypoman/. 
T https: //www.scipy.org/. 


8 https://numpy.org/. 
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6.2 Synthesis Performance 


Table1 presents the performance of FACTEST on several synthesis problems. 
Several points are worth highlighting. (a) The absolute running time is at the 
sub-second range, even for 6-dimensional vehicle models with 4-inputs, operating 
in a 3D workspace. This is encouraging for online motion-control applications 
with dynamic obstacles. (b) The running time is not too sensitive to dimensions 
of X and U because the waypoints are only being generated in the lower dimen- 
sional workspace W. Additionally, the construction of &:e¢ from the waypoints 
does not add significant time. However, since different models have different 
dynamics and Lypunov functions, they would have different error bounds for 
position. Such different bound could influence the final result. For example, the 
result for the Barrier scenario differs between the car and the robot. The car 
required 25 partitions to find a solution over all of © and the robot required 
22. (c) Confirming what we have seen in Sect.5.2, the runtime of the algorithm 
scales with the number of segments required to solve the scenario and the num- 
ber of obstacles. (d) As expected and seen in Zigzag scenarios, all other things 
being the same, the running time and the number of partitions grow with larger 
initial set uncertainty. 


Table 1. Synthesis performance on different scenarios (environment, vehicle). Dimen- 
sion of state space 4(n), input (m), radius of initial set ©, number of obstacles O, 
running time (in seconds). 


Scenario n, m | Radius of © | # O | Time (s) | # segments per ref | # partitions 
Zigzag, car 1 3, 2 | 0.200 9 0.037 6.0 1.0 
Zigzag, car 2 3, 2 | 0.400 9 0.212 4.0 6.0 
Zigzag, car 3 3,2 | 0.800 9 0.915 5.0-6.0 16.0 
Zigzag, robot 1 4,2 | 0.200 9 0.038 6.0 1.0 
Zigzag, robot 2 4,2 | 0.400 9 0.227 4.0 6.0 
Zigzag, robot 3 4,2 | 0.800 9 0.911 5.0-6.0 16.0 
Barrier car 3,2 | 0.707 6 0.697 2.0-4.0 25.0 
Barrier, robot 4,2 |0.707 6 0.645 2.0-4.0 22.0 
Maze, car 3,2 | 0.200 22 0.174 8.0 1.0 
Maze, robot 4,2 | 0.200 22 0.180 8.0 1.0 
SCOTS, car 3, 2 | 0.070 19 1.541 26.0 1.0 
SCOTS, robot 4,2 | 0.070 19 1.623 26.0 1.0 
L-tunnel, hovercraft | 4, 3 | 0.173 10 0.060 5.0 1.0 
L-tunnel, AUV 6,4 | 1.732 10 0.063 5.0 1.0 
Z-tunnel, hovercraft | 4, 3 | 0.173 5 0.029 4.0 1.0 
Z-tunnel, AUV 6,4 | 1.732 10 0.029 4.0 1.0 


Comparison with Other Motion Controller Synthesis Tools: A Chal- 
lenge. Few controller synthesis tools for nonlinear models are available for direct 
comparisons. We had detailed discussions with the authors of FastTrack [11], 
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but found it difficult to plug-in new vehicle models. RTD [44] is implemented in 
MatLab also for specific vehicle models. Pessoa [26] and SCOTS [38] are imple- 
mented as general purpose tools. However, they are based on construction of 
discrete abstractions, which requires several additional user inputs. Therefore, 
we were only able to compare FACTEST with SCOTS and Pessoa using the sce- 
nario SCOTS. This scenario was originally built in SCOTS and is using the same 
car model. 

The results for SCOTS and Pessoa can be found in [38]. The total runtime 
of SCOTS consists of the abstraction time taps and the synthesis time tsyn. The 
Pessoa tool has an abstraction time of tabs = 13509s and a synthesis time of 
tsyn = 535s, which gives a total time of ttot = 140448. The SCOTS tool has a has 
an abstraction time of tabs = 100s and a synthesis time of tsyn = 413s, which 
gives a total time of ttot = 513s. FACTEST clearly outperforms both SCOTS 
and Pessoa with a total runtime of ttot = 1.541s. This could be attributed to 
the fact that FACTEST does not have to perform any abstractions, but even by 
looking sole at tsyn, FACTEST is significantly faster. However, we do note that 
the inputs of FACTEST and SCOTS are different. For example, SCOTS needs 
a growth bound function Ø for the dynamics but FACTEST requires Lyapunov 
functions for the tracking error. 


6.3 RRT vs. SAT-Plan 


To demonstrate the speed of our SAT-based reference trajectory synthesis algo- 
rithm (i.e. only the while-loop from Line 6 to Line 15 of Algorithm 2 which we 
call SAT-Plan), we compare it with Rapidly-exploring Random Trees (RRT) [20]. 
The running time, number of line segments, and number of iterations needed to 
find a path were compared. RRT was run using the Python Robotics library [39], 
which is not necessarily an optimized implementation. SAT-Plan was run using 
Yices 2.2. The scenarios are displayed in Fig. 4 and the results are in Fig. 5. 


Average time to find a path Average iterations to find a path 
EE SAT-Plan mE SAT-Plan 
103} mmm RRT mm RRT 
103 
Fz 10! 2 
= S 
E © 1024 
F jo- £ 
10-3 i 10:4 H 
Easy Difficult Zigzag Partition Maze SCOTS Easy Difficult Zigzag Partition Maze SCOTS 


Fig. 5. Comparison of RRT and SAT-Plan. The left plot shows the runtime and the 
right plot shows the number of necessary iterations. Note that RRT timed out on the 
SCOTS scenario. 
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Each planner was run 100 times. The colored bars represent the average 
runtime and average number of iterations. The error bars represent the range of 
minimum and maximum. The RRT path planner was given a maximum of 5000 
iterations and a path resolution of 0.01. SAT-Plan was given a maximum of 100 
line segments to find a path. RRT timed out for the SCOTS scenario, unable 
to find a trajectory within 5000 iterations. The maze scenario timed out about 
10% of the time. 

Overall SAT-Plan scales in time much better as the size of the unsafe set 
increases. Additionally, the maximum number of iterations that RRT had to 
perform was far greater than the average number of line segments needed to 
find a safe path. This means that the maximum number of iterations that RRT 
must go through must be sufficiently large, or else a safe path will not be found 
even if one exists. SAT-Plan does not have randomness and therefore will find a 
reference trajectory (with & segments) in the modified space (bloated obstacles 
and shrunk goal) if one (with k segments) exists. Various examples of solutions 
found by RRT and SAT-Plan can be found on the FACTEST’s website [27]. 


7 Conclusion and Discussion 


We introduced a technique for synthesizing correct-by-construction controllers 
for a nonlinear vehicle models, including ground, underwater, and aerial vehicles, 
for reach-avoid requirements. Our tool FACTEST implementing this technique 
shows very encouraging performance on various vehicle models in different 2D 
and 3D scenarios. 

There are several directions for future investigations. (1) One could explore 
a broader class of reference trajectories to reduce the tracking error bounds. (2) 
It would also be useful to extend the technique so the synthesized controller can 
satisfy the actuation constraints automatically. (3) Currently we require user to 
provide the tracking controller gek with the Lyapunov functions, it would be 
interesting to further automate this step. 
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Abstract. Chemical reaction networks (CRNs) play a fundamental role 
in analysis and design of biochemical systems. They induce continuous- 
time stochastic systems, whose analysis is a computationally intensive 
task. We present a tool that implements the recently proposed semi- 
quantitative analysis of CRN. Compared to the proposed theory, the 
tool implements the analysis so that it is more flexible and more precise. 
Further, its GUI offers a wide range of visualization procedures that facil- 
itate the interpretation of the analysis results as well as guidance to refine 
the analysis. Finally, we define and implement a new notion of “mean” 
simulations, summarizing the typical behaviours of the system in a way 
directly comparable to standard simulations produced by other tools. 


1 Introduction 


Chemical Reaction Networks (CRNs) are a language widely used for modelling 
and analysis of biochemical systems [10] as well as for high-level programming of 
molecular devices [6,33]. They provide a compact formalism equivalent to Petri 
nets [30], vector addition systems [24] and distributed population protocols [3]. 
A CRN consists of a set of chemical reactions of given species, each running at 
a certain rate (intuitively, speed). 


Example 1 (Gene expression). Our running example is the classic simple expres- 
sion of a protein given by the reactions of production (p) and degradation (d) of 
proteins and blocking (b) the DNA, over three species: protein (P), active DNA 


(DNAon), and blocked DNA (DNAog): 


p: Don —» Don + P d: P 24 b: Don + P 2% Dog 


Using mass-action kinetics (the reaction rate is multiplied by the populations of 
the reactants), the CRN induces a infinite population Markov chain in Fig. 1. 
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Fig. 1. The Markov chain for Gene expression, displaying the population of P. To 
simplify the exposition, Don and Dog are displayed as discrete “states” of the system, 
but in fact the two “states” are just shorthands for 1,0 and 0,1, respectively. 


In order to facilitate numerous applications in systems and synthetic biology, 


various techniques for simulation and formal analysis of CRNs have been pro- 
posed, e.g. [2,7,15,18,32]. We pinpoint several specifics of this setting, necessary 
to motivate and understand the features of the tool: 


1. 


The analysis is notoriously difficult and computationally expensive due 
to several aspects: state-space explosion (exponential growth in the number 
of species, possibly infinite spaces due to unbounded populations as in Fig. 1, 
different rates for different populations, again as in Fig. 1), stochasticity (races 
between reactions), stiffness (rates of different magnitudes), multimodality 
(qualitatively different behaviours such as extinction of predators only, or 
also of preys in the predator-prey models) [17,34]. Consequently, even for 
small CRNs, simulations may take minutes and analyses hours. 

We have to face imprecise inputs. In particular, even if all relevant reactions 
are known, the rates are typically not. It is then not clear what behaviours 
can be induced by all possible values. 

The analysis output need not be precise numerically, but only qualita- 
tively. For instance, it is important to know that initial growth is followed by 
extinction and what the order of magnitude of the peak population is, but not 
necessarily what the exact distribution at an exact time is. Unfortunately, it 
is hard to compute the qualitative information without the quantitative one. 
Biologists and engineers often seek for plausible explanations of why the 
system under study features or not the discussed behaviour. In many cases, a 
set of system simulations/trajectories or population distributions is not suf- 
ficient and the ability to provide an accurate explanation for the temporal or 
steady-state behaviour is another major challenge for the existing techniques. 


SeQuaiA! is a tool for analysis of CRN addressing these issues: 


1. 


2: 


It features unprecedented scalability, analysing standard complex bench- 
marks within a fraction of a second. 

It is robust w.r.t. concrete rates, not depending on the exact values but only 
on their orders of magnitude. 

Its semi-quantitative analysis is precise enough to conclude on the qualita- 
tive behaviour of the system including rare behaviours and on rough estimates 
of the quantities (population sizes, times). 


1 Available at https://sequaia.model.in.tum.de. 
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4. It produces small abstract models (Markov chains) that are explicit, yet 
interpretable, making the behaviour more explainable. 


It is based on the technique presented in [9], relying on two cornerstones. Firstly, 
it computes a system abstraction with acceleration, abstracting not only states 
and single transitions, but taking into account segments of paths. The resulting 
models are small enough to allow for a synoptic observation of the model dynam- 
ics. Secondly, it performs semi-quantitative analysis, focusing on the most 
probable behaviours and more qualitative, global descriptions, such as oscilla- 
tion, rather than fully quantitative sequences of exact transient distributions. 
This yields explainable models and is a sufficient and computationally cheaper 
technique. While the basic theory is derived from [9], there are a number of new 
features and differences in our tool, not just the implementation: 


Method: (i) The abstraction is more precise now that the tool can also com- 
pute numerical outputs, whereas [9] focuses on a manually feasible, and hence 
imprecise, abstraction. (ii) It suggests how to refine the abstractions, provid- 
ing a knob for trading precision for computational resources. 

Visualization: The GUI provides a number of ways to display the results, facil- 
itating understanding the models, including (i) identification of strongly con- 
nected parts of ‘iterations’, corresponding to ‘temporarily stable’ behaviours, 
(ii) quantitative information on transient times and steady-state distribu- 
tions, or (iii) visual qualitative explanations, such as semantic grouping of 
states or tracking correlations between populations. 

Additional analysis instruments: (i) The new notion of envelope provides an 
explicit knob to consider not only the most probable, but also less probable 
behaviours. (ii) The novel concept of mean simulation yields summaries of 
most probable runs and an analysis output directly comparable to classic 
simulation-based tools. 


Related Work. Since a direct analysis of the Markov chains induced by CRN 
does not scale well [19], deterministic approximations through fluid (mean-field) 
techniques can be applied [4,8] to large populations, but cannot adequately 
capture the stochasticity of CRNs caused by low population species. To this 
end, both can be combined in hybrid approaches [7, 18,21], typically involving 
a computationally demanding numerical analysis. Reduction techniques such as 
[1,12] are based on approximate bisimulation [11], on aggregation according to 
the CRN-specific structure [13,27,35], or state truncation [20, 28,29]. 

Despite the plethora of techniques, the practical analysis of CRNs often 
relies on the stochastic simulation [15] and its multi-scale improvements [5, 14, 17, 
22,31,32]. The widely used tool include the platform-independent Copasi [23], 
DSD [25] with a convenient web-based graphical interface, or StochPy [26] easily 
extensible using Python scientific libraries. In contrast, our approach (i) provides 
a compact explanation of the system behaviour in the form of tiny models allow- 
ing for a synoptic observation (ii) can easily reveal less probable behaviours, and 
iii) as shown in [9], is able to analyse standard complex benchmarks in seconds 
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and thus provides the unprecedented scalability compared to other numerical as 
well as simulation-based techniques. 


2 Workflow and Key Functionality 


In this section, we guide the reader through the workflow, discuss the key features 
of the tool and demonstrate them on examples. The GUI is structured into 
several tabs and panels reflecting the workflow of the tool. First, a CRN is either 
retrieved from a file in the Open model tab or a new one is created. Either way, the 
model can be changed in the Editor panel together with the analysis parameters. 
The process continues in the Analysis tab. The analysis follows in two steps. First, 
the semi-quantitative abstraction of the Markov chain for the CRN is generated; 
second, the semi-quantitative analysis is performed on the abstraction. The tool 
offers an explicit option to display the abstraction as a .dot file or to directly 
run both steps. After the complete analysis is executed, the Visualization panel 
offers a range of options to display the results, including various quantitative 
properties. Finally, the analysed model can be used to generate concrete runs 
on the Simulation tab, which we call mean simulations since they display the 
“average-case” behaviour. In the following we detail on these key elements. 


<d,[0.2,2]> <p,10> <d,[2.2,5]> <p,10> <d,[5.2,100]> <p,10> 
| <p.10>() Q <p,10> Q 
[Don 0] [Don, 1-20] Don, 21-50 


<—— 
<d,0.1> <d,2.1> 
<b,[0.001-0.02]> <b,[0.021-0.05]>, 


[Dor 0] (<——[Dorr, 1-20]}¢ —_[D 4, 21-50] 
<d,0.1> O <d,2.1> 
<d,[0.2,2]> <d,[2.2,5]> <d,[5.2,100]> 


| <p,1> <Ap,0.36> <Ap,0.0019> 


[Don 0] EL O 1-20][ Don 21-50] [Don 51-1000] 


<d/Ad,0.1> <Ad,0.13> <Ad,0.087> 


<b,0.53> 


[Dof 51-1000] 


Time (s) 
<d/Ad,0.1> <Ad,0.14> <Ad,0.11> 


Fig. 2. Left: The abstract Markov chain for Gene expression with population dis- 
cretization thresholds 20,50 and the population bound 1000. Top: The classic may 
transition function. Bottom: The semi-quantitative version with accelerated transi- 
tions (denoted by prefix “A”). Right: The full blue line shows a typical simulation 
of the model (population of P), obtained using DSD tool [25]. The dotted green line 
corresponds to the fast variant of the model with the rate of b being 10~. (Color figure 
online) 
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2.1 Semi-quantitative Abstraction 


Key Idea. The abstraction of the state space is simply given by a discretization of 
the population for each species into finitely many intervals, see Fig. 2 (left). The 
classic may abstraction of the transition function results in non-deterministic 
self-loops as in Fig.2 (left top) in red, which make impossible to conclude any- 
thing useful (except for some safety properties) on the behaviour once we reach 
such a state, even whether it is ever left at all. Instead, [9] considers sequences 
of transitions: in this case, sequences of prevalently growing transitions (those 
increasing the population) are significantly more probable than the prevalently 
decreasing ones. Consequently, the self-looping transitions are accelerated (taken 
multiple times) to get a “combined” transition that brings a typical represen- 
tative of this population interval into a higher interval, see Fig. 2 (left bottom) 
also in red. Hence the new rate reflects (i) the mass-action kinetics with the 
typical population in the interval and (ii) the typical number of the transition 
repetitions before another interval is reached. These accelerated transitions are 
the key idea of the semi-quantitative abstraction and are denoted by a prefix A. 


Tool Inputs. Technically, the tool requires, for each species, a (possible empty) 
list of increasing population thresholds t1, t2,...tn and a population bound tọ. 
The thresholds split the concrete population to the intervals [0,0], (0, t1], (t1, t2], 

.- (tn—1, tn], (tn, 00). Here 0 is taken separately to reflect enabledness of actions; 
the representatives, used for consequent computations, are chosen to be in the 
middle of the intervals and derived from t for the last one. (For the empty list 
we have only one non-zero interval (0,00)). The input numbers are supposed to 
reflect the monitored property of interest and the required precision, the bound 
t, should give a probable upper bound on the maximal population. How to obtain 
and iteratively improve these is discussed in Sect. 2.5 on refinement. 


Example 2. Consider Gene expression, now with a ‘fast’ blocking where the rate 
of b equals 10~?. A typical simulation can be seen in Fig. 2 (right, dotted green 
line): the number of proteins grows until several dozen, then blocking takes place 
until extinction. The semi-quantitative abstraction for thresholds 10, 20, 50 yields 
the model in Fig. 3(a). In contrast to classic abstractions, there are no self-loops 
and the abstract transitions are assigned concrete rates. One can see that the 
blocking can in principle take place at any population and that population can 
decrease also when DNA is on, i.e. in states [1,0,-]. However, all this happens 
with very low probabilities and the model captures this only indirectly through 
the numerical labelling. This is made explicit during the semi-quantitative 
analysis. 


2.2 Semi-quantitative Analysis 


Key Idea. The aim is to prune the abstraction so that only reasonably probable 
behaviour is reflected, see the thick transitions in the abstraction in Fig. 2 (left 
bottom). To this end, we preserve in each state only the transitions with the 
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Fig. 3. (a) and (b): ‘Fast’ Gene expression with thresholds 10, 20,50. (a) depicts the 
full abstraction and (b) depicts envelope = 3. (c)—(e): ‘Slow’ Gene expression with 
thresholds 20, 50, 80, 150. (d) and (e) depicts the pruned abstraction with envelope = 3 
and 1, respectively. 
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highest rate h or almost highest rates, i.e. with h’ > h/envelope where envelope > 
1 is a parameter. Parameter values in [1,10] ensure we can only look at rates of 
the same order of magnitude, thus the most probable events and those with e.g. 
only 20% chance of happening. Higher values then allow for inspection of even 
less probable behaviours. 

Consequently, the method can naturally handle uncertainty in the reaction 
rates since typically only the relative magnitudes of the rates are important, 
actually, only their orders of magnitude. This robustness w.r.t. the input is very 
beneficial for biologists as the precise rates are often not known. 


Example 3. The analysis of the previous ‘fast’ Gene expression with envelope = 3 
is depicted in Fig.3(b). As such it shows the most probable behaviours: the fast 
growth until the intervals 2 and 3 (i.e. 10-20 and 20-50) and not beyond to 
4 (over 50), followed by a slower decline. The computed rates induce expected 
times to pass through a state, matching closely those of the simulation Fig. 2 
(right, dotted green line). Moreover, we see that the blocking transition from 
interval 2 has a lower probability than the production, is thus less probable. As 
such it would not even appear as a probable one, for a stricter envelope = 2. 


Example 4. A more complicated behaviour arises when the blocking is slow, with 
rate 1073 as in Sect. 1. A simulation run for this case is depicted in Fig. 2 (right, 
full blue line). One can observe a more balanced competition between blocking 
and oscillation around 70-100 proteins. Similarly, while the full abstraction (not 
shown here) features arbitrary oscillations (also back to no proteins at all), after 
analysis the pruned abstraction is faithfully modelling the initial growth, subse- 
quent oscillation only in the range of higher populations, followed by blocking 
and gradual extinction of proteins, see Fig. 3(c). 


Technically, the analysis relies on repeated alternation of transient and 
steady-state analysis. First, starting from the initial state, we follow in each 
state only the transitions with highest rates (most probable ones), until the set 
of explored state reaches a fixpoint. A part of the created graph is recurrent and 
forms a bottom strongly connected component (BSCC) or a collection thereof. 
The system temporarily settles in the steady state of this BSCC. After some 
time has passed, also a less probable transition happens almost surely and the 
“BSCC” is exited. These exit points are identified by a steady-state analysis of 
the BSCC, taking the magnitudes of exiting and non-exiting transition rates into 
account. The exit points trigger a new iteration of the transient and then the 
steady-state analysis. 


Example 5. Figure 3(d) illustrates a situation with two iteration using the slow 
variant of the model. Decreasing envelope to 1 caused that the blocking reaction 
is explored in the second iteration — as an exit of the BSCC found in the first 
iteration. Before that exit happens, the “BSCC” represents a “temporary” steady 
state of the system. 
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Note on Correctness. As discussed in [9], the semi-quantitative analysis provides 
guarantees in the form of limit behaviour and convergence: firstly, the precision 
grows with the differences in the orders of magnitudes of involved rates: as 
their ratios tend to infinity, the error tends to zero; secondly, as the population 
discretization gets finer, the error in the new “accelerated” transitions is reduced, 
trivially being zero for complete refinement into singletons. 


2.3 Visualization of Qualitative Information 


A proper visualization is essential for clear presentation and easy interpretation 
of the results of our analysis. To this end, the tool and its GUI offer various 
options for visualizing the results. The basic ones, related to the graph structure, 
are the following. Further options, with more quantitative flavour, are discussed 
in the next section, followed by an example illustrating all of them. 


Iterations. As the complete abstract model is typically very large and chaotic, 
further structuring is necessary. Therefore, the default view shows the states 
arranged and grouped into separate blocks, one for each iteration, additionally 
coloured distinctly for each iteration. Besides, we can restrict which iterations we 
show. This is useful to zoom in and investigate a particular part of the behaviour. 


Intra-iteration SCCs (IISCCs). Additionally, the arrangement and colouring 
can be based on aggregating SCCs within each iteration (IISCCs). This helps to 
understand the emergence of repetitive behaviour patterns, such as oscillation or 
(temporary) steady state. It can be also combined with the iteration grouping. 


Collapsed Views. In order to understand the system behaviour, one typically 
needs to have a synoptic overview of the system. For more complex systems, 
even the pruned abstraction could become too large and the view of the fully 
expanded system might not be sufficiently compact. In such cases, the aggregates 
discussed in the previous views, i.e., iterations and IISCCs, can be collapsed 
into a single nodes, hiding the complexity of the exact behaviour pattern within 
these areas. This allows us, for instance, to ignore the particular (temporary) 
oscillation or steady state in these states and to focus on more global behaviour, 
such as what happened before and after this behaviour and how often does it 
arise. In contrast to zooming in by restricting to certain iteration(s) only, the 
collapsed views provide a means to zoom out. 


2.4 Visualization of Quantitative Information 


The produced graphs are also labelled by numerical information. While the 
quantities cannot be precise due to the simplifications of the extremely scalable 
analysis, they match the orders of magnitudes of the observed quantities, which 
is often precise enough for biological purposes; for instance, the peak of protein 
growth happens after units vs. dozens of seconds in the fast and slow variants 
of Gene expression, respectively. 
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Transient Analysis. Firstly, each abstract transition is labelled with a rate cor- 
responding (in the order of magnitude) to the rate of the concrete transition 
(or accelerated transition, i.e. a “sequence” of transitions) of a “typical” rep- 
resentative of the abstract state. These rates induce the expected time spent 
in each transient state of each iteration. Indeed, the waiting time is simply the 
inverse of the sum of the outgoing rates. Further, each BSCC of each iteration 
is labelled by an estimate of time before it is left into the next iteration. This is 
a key notion, which allows us to easily provide transient timing information for 
very stiff systems (working at different time scales). Consider the simple gene 
model. From Fig. 3(b) and (d) we can easily compute the expected time to the 
extinction (as the sum of the exit time for all SCC on the inspected path). Our 
analysis correctly estimates that the expected extinction time is around 24 and 
for the fast variant and 40 for the slow variant. 


Steady State Analysis. In many biological models, the natural steady state is 
either extinction or unbounded explosion. Hence it does not say much about the 
“seemingly steady” state (the temporary steady state), i.e., behaviour that is 
stable for a long but finite time. Therefore, the tool provides information not only 
on the steady state of the whole system, but also for each iteration separately 
since they represent the temporary steady states discussed above. Both can 
be visualized as colouring of states, with higher probabilities corresponding to 
darker colours, immediatelly giving a synoptic view on frequent behaviours. 


Correlations. Finally, correlations between population sizes can be observed as 
follows. The GUI can be given a set of equivalences of the form m ~n for species 
i,j, meaning that if a state has (abstract) population m of species i and n of 7 
then it is regarded as satisfying the correlation in question. It is coloured accord- 
ingly and the overall colouring of the system provides further indication under 
which behaviour or in which phases the correlation holds. 


Example 6. We demonstrate these visualization options on a more complicated 
gene expression model [16], widely used model for benchmarking CRN analyzers, 
in Fig.4. As reported in [16,18], the behaviour oscillates between two steady 
states with DNA on and DNA off. Moreover, there is a correlation between high 
amounts of RNA present and DNA being on, and no RNA with DNA off. 

The complete system and its steady state distribution is depicted in the 
part a) using the iteration and IISCC arrangement. This view shows immedi- 
ately without seeing any details that the only interesting states are in iteration 1 
including all states with a high steady-state probability (the red colouring). 
Therefore, in part b), we zoom in to iteration 1 and use the IISCC arrangement. 
In order to observe the interesting switches between the temporary steady states, 
we collapse the IISCCs, in the part c), and thus ignore the internal (non- 
interesting) behaviour of the big IISCC. Finally, in part d), we use the cor- 
relation colouring to identify states where the required correlation holds (i.e. 
the blue states). Comparing part c) and d) immediately reveals that the system 
spends the majority of the time in the states where the correlation holds. 
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Fig. 4. A visualisation of the workflow for the extended gene expression model. (Color 
figure online) 


2.5 Precision and Refinement 


So far, we have illustrated the concepts and the functionality on models with 
an appropriate level of abstraction. However, it often happens that we start 
the investigations with a too coarse abstraction. Whenever this happens, it is 
important to notice this and appropriately refine the abstraction. While [9] does 
not discuss this issue, the tool provides support also for that. 


Precision Parameters. There are several knobs for trading the size and the 
precision of the abstraction. They all come as input in the lower half of the 
Editor tab: discretization, bound, and envelope. 


Example 7. Recall the initial abstraction for the Gene expression of Fig. 2 (with 
rate 1078). The abstraction, using thresholds 20, 50 predicts an oscillation includ- 
ing low populations of P (1-20) which is not correct (recall that the P oscillates 
on high populations before the blocking reaction occurs). Figure 3(c) and (d) 
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show the abstraction and the consequent analysis and visualization for a refined 
model using thresholds 20,50, 80,150 (instead of just 20,50). As already dis- 
cussed, this abstraction already correctly predicts the system behaviour. 


Discretization. The basic building block of each abstraction is the degree of 
details it preserves in the abstract states. Firstly, it determines how precisely 
we can observe the evolution of the population. For instance, whenever we want 
to detect whether a population typically grows beyond a bound or oscillates in 
a certain interval, such an interval should be present in the discretization. Sec- 
ondly, the discretization should be fine enough so that in each state, the rates are 
reasonably (in orders of magnitude) precise. Fortunately, in our analysis their 
absolute precision is not vital. In contrast, we only need relative proportions of 
the rates to have the right magnitude to decide which behaviour is probable. Con- 
sequently, too rough abstraction is reflected in “non-determinism” when a state 
has two transitions under similar rate. In such a case, the probable behaviour 
cannot be determined. Therefore, the Visualization tab provides in the Coloriza- 
tion pane an option to provide suggestions for refinement, including highlighting 
non-deterministic states, pointing at the natural candidates for refinement. Note 
that we highlight only the states where the two transitions lead to mutually dif- 
ferent SCCs so that a significant change in behaviour may occur. 


Bounds. Similarly, for the single infinite interval (t,,,00), the tool inputs a bound 
which is a believed safe upper bound on the population of the species. Of course, 
it may be wrong. This is irrelevant in case when the population explodes beyond 
all bounds. However, whenever there are transitions from the highest level back 
to a lower one, its feasibility and rate are in question. Optimally, such states 
do not even occur in the pruned abstraction. If they do, we also highlight them 
using the Colorization for Refinement suggestions (in another colour). 


Envelope. As too rough abstractions introduce too much non-determinism, 
dually, the degree of the non-determinism is determined (even defined) by the 
envelope, the factor between rates so that even the less probable option is still 
taken into account (and thus introduces non-determinism). Consequently, high 
values of envelope introduce non-determinism, making the analysis take also less 
important behaviour into account; in contrast, low values make the analyzed 
system deterministic, showing only the most probable behaviour. The choice of 
the envelope thus depends on whether such behaviours should also be reported. 


2.6 Mean Simulations 


Since our models, although abstract, have an operational semantics, we can even 
run simulations on them. Moreover, the accelerated transitions, as “sequences” 
of transitions, have a low variance in the expected time, by the law of large num- 
bers. Hence their execution time can be chosen quite precisely in a deterministic 
way. Similarly, the time to leave an IIBSCC is quite deterministic. Thus we can 
generate simulation where the only random decisions are choices of transitions, 
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but the timing follows the mean time of the respective events. Moreover, runs 
within the pruned abstraction reflect the most important behaviours only. 
Such mean simulations’, which can 

thus be generated from our analysis, repre- > 
sent groups of typical runs (modulo small 
time shifts and order of transitions within 
an SCC, which are not very relevant). 
Therefore, a few such simulation reflect 
all the present behaviours (on a level of 
desired significant probability) and can 
serve to observe multi-modalities, bifurca- 
tions, rough transient timing as well as fre- 
quencies in the steady-state and tempo- 
rary steady-state. To our best knowledge, 
such a concept has not yet been considered 
for simulation of stochastic systems. 


Trajectory 


N 


Population levels 


o 
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 
Time 


Fig. 5. Mean simulation for the slow 
variant of Gene expression, directly 
comparable to Fig. 2 (right, full line). 


Example 8. Figure 5 shows an abstract simulation for our running example with 
discretisation thresholds 20, 50, 80,150. One can readily observe its validity with 
respect to the typical stochastic simulation in Fig. 2 (right, full blue line). 


3 Conclusion 


We have presented SeQuaiA, a scalable tool for robust and explainable analysis 
of CRNs. The analysis is precise enough as cross-validated with simulation-based 
results on several models widely used in the literature. One of the key contribu- 
tions of the tool is the visualization, which is essential for clear presentation and 
easy interpretation of the results of our analysis. 
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